More comments:
median not mean
-> Median lies at 21!
[check] typo gender comparisons
[check] no need for additional legends
[check] footnote for each graphs - labels VIMP
[check] samples are restricted to what variables - new descriptives
[check] confirm the correlation
[check] updating problem plotly
[check] check layout problem plotly - see exactly what is happening
very specific questions - attitudes differ as a function of some variable
**What are the variables that could account for differences in the attitudes?**
for example, using whose genetic information participants want to know could indicate something about their curiosity, similarly, the number of items participant is concerned about could indicate their concern, which can be converted into a concern score.
Notes: DTC genetic testing was done by only 181 participants in RU group, likewise, purpose of genetic testing.
- comparing these categories to see what is different, it was found that groups differed in curiosity across comparisions.
- concern was not differing with any classification, however concern profile participants differed in GK
- therefore curiosity was selected for further exploration
High_concern Medium_concern Low_concern
High_curiosity 17.32 % 22.43 % 18.24 %
Medium_curiosity 24.66 % 39.52 % 24.57 %
Low_curiosity 24.84 % 31.01 % 18.50 %
More participants are high concern and low curious, then low concern and high curious.
Most overlap is between medium concern and medium curious participants.
Second most overlap is between low curious and medium concern participants.
High concern and high curious participants slightly less then low concern and low curious participants.
Most participants were low confidence and low gk. Low confidence low gk and high confidence high groups were nearly equal.
- MCA plot reveals that even though medium curious and concern participants overlap, they differ from each other in terms of inertia.
- Medium concern, concern and high concern are closer to each other
- while medium curious and high curious are still farther apart, which means there is more difference between medium curious to high curious then it is from medium to high concern.
- student, younger age, low scoring and non legal categories are farthest.
- medium and high concern are closer to non-students, than high or medium curious.
- other categories are including gk are more closer to origin, which means these should not be selected to explore further
- in fact both confidence profiles and gk, and confidence profiles and curiosity are different by high statistic
- Categories are hard and soft. a hard division in the data is when we have independent groups based on a particular criteria. For example, there can be either students or non students.But if we divide the data based on variables that classify into individual differences, then we will have overlaps. For example, gk, confidence, curiosity and concern could all differ individually irrespective of the hard divisions. This means within hard divisions there will be no overlaps, but in soft division we can see participants being similar or different in one aspect or another [Example graph: all categories versus soft categories]
- Once again we see that low and high concern participants are grouping together with high confidence and medium curiosity. But low curiosity, low confidence, scoring and medium concern participants are in a different cluster.
- It will be interesting to look at a difference between low curious and high curious participants.Participants who want to know everything versus those who are not interesting.
- Likewise, medium and high concern could be an interesting exploration.
Therefore, attitudes are indeed differing with curiousity and concern. But curiosity makes for a more interesting exploration. Most participants are low curious but medium concern or high concern. Low curious participants are also low confidence ant low scoring. This is where opinions are differing the most.
Clustering method to see how categories organise the themes in the survey
[[Clustergram:file:///home/mrinalmanu/Documents/iglas_le/non_hard_select_all_gr_relations_minus_8_and_9.html]]
A clustergram of soft-categories with all GK items revealed that
Group 1: Medium concern, low curiosity grouping together with concerns of data strorage and genetic discrimination. They agree to revising and updating and policy making. They are low confidence, and more reserved. [[Reserved]]
Group 2: But this cluster is closer to another group who are mediym curious and high concern. These participants are neutral towards policymaking, dissemination of gk and revising and updating. [[Neutral]]
Group 3: Then there is a group of participants who are disagreeing to everything, they think medical facilities or 1 legal guardian is sufficient in new-born seqencing. These participants also don't know if there should be a law through which an individual can protect their genetic rights. [[Sporadic]]
Group 4: Finally there is a cluster of participants who are more open, high scoring, congident, and agree to everything. They are low concern and high curious. A concensus for them is two legal guardians need to agreee. [[Open]]
Group 3 is the most interesting to observe as they are strongly diagreeing, but they are also not interested in knowing things. This group also includes the minority of participants who give state and medical facility the responsibility for newborn screening, and dont want to be labelled as having any deficiency and high in concern.
These groups can be put together to explore the behaviour further. But before that it is also important to check the soft categories in the data for overlap.
Network-like method to check interaction between all categories two groups at a time
-> All categories were visualised without any distinction between hard and soft categories.
These are not plots of shared particiapnts across categories, but shared responses. It is a network representation of the sankey flow diagram. Each central node in this case is the source from which the flow is diverging.
Let's look at Concern with Curiosity interactions with those who answered the item about relatedness correctly or incorrectly.
From False = 615 Low curiosity = 307/615 = 49.918 % Medium curiosity = 169 = 27.479 % High curiosity = 139 = 22.60 %
From True = 158 Low curiosity = 74/158 = 46.835 % Medium curiosity = 35/158 = 22.151% High curiosity = 49/158 = 31.012%
From High curiosity = 381 Low concern = 97 = 25.459 % Medium concern = 167 = 43.832 % High concern = 117 = 30.086%
From Medium curiosity = 204 Low concern = 52 = 25.49% Medium concern = 98 = 48.03% High concern = 54 = 26.47%
From Low curiosity = 188 Low concern = 52 = 27.659% Medium concern = 83 = 44.148% High concern = 53 = 28.191%
From High concern = 224 To High curiosity = 53 = 23.66 % To Medium curiosity = 54 = 24.107% To Low curiosity = 117 = 52.232%
From Medium concern = 348 To High curiosity = 83 = 23.85% To Medium curiosity = 98 = 28.16% To Low curiosity = 167 = 47.988%
From Low concern = 201 To High curiosity = 52 = 25.87% To Medium curiosity = 52 = 25.87% To Low curiosity = 97 = 47.263%
Source Low curiosity = 381 Source High curiosity = 204 Source High curiosity = 188
Let's further focus on on Low curiosity participants, total = 381
307 (49.918%) participants who aswered incorrectly constitute the low curiosity particpants 74 (46.835%) who answered correctly constitute the low curiosity participants
From these 381 participants, 117 (52.232%) were high concern, 167 (52.232%) were medium concern, and 97 (47.263) were high concern.
Therefore, almost equal number of low curiosity participants answered the item correctly or incorrectly (more incorrectly than correctly). They were distributed equally across medium and high concern, but slightly less on the lwo concern end.
Similarly, we can see that for Medium curiosity participants 5% more participants answered the item incorrectly than correctly. 28% of these were high concern, while remaining were almost equally medium and low concern.
Finally, nearly 10% more of the high curious participants answered the item correctly than incorrectly. High curious participants contributed were slightly more low concern than mediym or high concern.
Conclusiom, high curious participants were 10% more likely to answer the item correctly, and nearly 2% more likely to be low concern. Low curious participants were 3% less likely to answer the item correctly than correctly, and nearly 5% more likely to be high concern. Medium curious participants were 5% less likely to answer the item incorrectly, and nearly 4% (3% + 1%) more likely to be medium or low concern.
These same calculations can be done rapidly by looking at the corresponding network plots for the sankey diagram. The strength of interaction is represented as the thickness of the coloured line. The network plot has the same colour scheme as that of the sankey plot. Notice that the colour for high curiosity and True (answered correctly) is thicker than that of answered incorrectly. It is a visual statement that high curious participants are more likely to answer correctly than incorrectly. Therefore, even though the number of contribution to a node could be larger, the thicker edge represents the likeliness of belonging to that category.
Therefore, looking at the network we can see high concern participants are more likely to be low curious. However notice that the lines are not too thick or thin, it means that the contrast is smaller.
An incredibly high contrast is between high and low confidence with relationship to age profile. Older age participants are highly likely to be high confident. Thefore, an important conclusion that could be drawn here is that if we compare the groups, the groups that have a big contrast are likely to be significantly different. However we have to keep an eye for the number that contributes as well, for example, only 64 are high confident (total = 254), as opposed to 110 to low confidence. In this case we have nearly twice the number of younger participants, than older ones (ratio=64/110 = 0.5818). We observe that Age with Confidence was non-significant (p = 0.135, stat=55985). Effect size is large in this case. The ratio calculated in this case is only showing how much larger one quantity is to another. Therefore, smaller ratio implies that the group sizes are comparables.
In summary, we can look at the ratio between the groups to say how bigh the effect size could be. And with a smaller ratio, and a large edge contrast, we can say with some certainity that the observed effect is significant.
This leads to the final important conclusion,the observed effect between curiosity profiles.
For low curiosity True = 307, False = 74 (ratio = 0.24) For medium cuioristy True = 35, and False = 169 (ratio =0.20) For high curiosity True =49, False = 139 (ratio=0.3525)
In fact, this item was only answered correclty by 21% of the total participants. But high curiosity participants, answered this item correctly 50% of the time. This is still at chance level. If we should compare these groups then high curiosity participants would perform significantly better in general.
For graphs with only 3 nodes, comparing ratios can reveal trends. However, we have to be more careful with more shared nodes.
I will give few more examples, that would help us establish some predicitions. This would highlight the utility and power of visualising the flows using network like approaches.
Further examples,
- High scoring profile node is shared by 77 older age participants and 200 younger age participants (0.38 ratio)
- Low scoring profile shares 97 older age participants and 399 younger age participants (0.24 ratio)
Yes, the trend is that
[[ GK increases slightly with age how ever at any age more participants have less gk ]]
And, if we compare the groups
[[ Gk and age wont be significant, as the strenghts of flow are comparable in the two scoring profile for older and younger participants ]]
Proof: Comparing the GK of two groups with Age yielded non-significant result (p=0.031, stat=2.164)
- Older age is shared by 64 for high confidence and 110 for low confidence (0.58 ratio)
- Younger age is shared by 190 for high confidence and 409 for low confidence (.46 ratio)
Yes, the trend is that
[[ Older age participants are more confident than younger age participants]]
And, if we compare the groups
[[ Confidence would not vary with age profiles.]]
Conclusion:
[[ confidence is not related to age, at any age more participants are low confident in GK, and there is a slight increase in confidence in GK with age ]]
Proof: Comparing the Confidence score of two groups with Age yielded non-significant result (p=0.135, stat=55985)
- Law branch had the highest medium (132) and high concern (79) shared connection, while 58 for low concern.
- Other branches had the highest medium (121) and high concern (82) shared connection, while 89 for low concern.
- Non students had heighest medium concenrn (95), then 63 high concern, and 54 low concern.
Yes, the trend is that
[[ Most participants are medium and high concern. ]]
The edges connecting for all three of these nodes to concern are almost equal in length. So the effect should be non significant.
Proof: Comparing Branch with Concern revelaled non-sigificant result (p=0.12, stat=4.242).
Therefore, we have a good enough method to show us the strength of effect as well as whether there should be a difference. For graphs with only 3 nodes we can also talk about trends using ratios.
Using this approach, I will look at trends for three questions.
What is the trend in who should decide and whether participants answered the item correctly or incorrectly
What is the trend in agreeing or disagreeing to something on the survey from those who answered the item about relatedness incorrectly or correctly
What is the trend in agreeing or disagreeing to something on the survey, and newborn sequencing decision-making from participants
What is the trend in agreeing or disagreeing to something on the survey, and newborn sequencing decision-making from participants of different curiosity profiles
In order to explore the outstanding paths we can do a longest path analysis.
What is the longest overlap of categories in the data
More participants imply that more shared paths can be observed.
Top 3 paths for categories of those who answered the items incorrectly and correctly
From To Count From To Count 10 False 39 Younger Age Profile,Low confidence Confidence profile,Law Legal,Student student,Law branch branch,Medium concern,Low curiosity,Low Scoring_profile 39 10 False 24 Younger Age Profile,Low confidence Confidence profile,Law Legal,Student student,Law branch branch,Medium concern,High curiosity,Low Scoring_profile 24 10 False 20 Younger Age Profile,Low confidence Confidence profile,Law Legal,Student student,Law branch branch,High concern,Low curiosity,Low Scoring_profile 20
10 True 8 Older Age Profile,High confident Confidence profile,Non law Legal,Not student student,Not a student branch,High concern,Low curiosity,High Scoring_profile 8 10 True 6 Younger Age Profile,High confident Confidence profile,Non law Legal,Student student,Other branch branch,High concern,Low curiosity,High Scoring_profile 6 10 True 6 Younger Age Profile,Low confidence Confidence profile,Non law Legal,Student student,Other branch branch,High concern,Low curiosity,High Scoring_profile 6
- We can see that law, medium concern participants, low in curiosity are the ones answering the question incorrectly the most. Law related participants dominate the top three paths.
- Similarly, older age participants, high in confidence, and non legal, non students are the ones answering correctly the most. Second to that is younger profile participants with the same categories. GK high scorers are both low and high confident participants.
Top 3 paths for opinions for likert items
COMPLETE ANALYSIS
Analysing Policymaking disagreement and agreement
37 of those who answered incorrectly disagreed to all three items. 8 of those who answered correctly disagreed to all three items. Total = 37+8 = 45
11 of those who answered incorrectly who disagreed to policy making, agreed to remaining two. 11/37 = .2972
6 of those who answered incorrectly and disagreed to policy making also disagreed to revising and updating but agreed to the dissemination of GK. 6/37 = .1621
This shows that those who disagree more, they are disagreeing to all three of them.
Similarly, 373 who answered incorrectly agreed to all 3 123 who answered correctly agreed to all 3
34 of those incorrect agree to policymaking +neutral to revsising and updating + agree to dissemination 16 of those incorrect who agree to policy making, agree to revision and agree to dissemination 13 of those incorrect who agree to policy making, agree to revision and updating but neutral towards dissemination 9 of those incorrect agree towards policymaking disagree too revising and updating but agree to dissemination of gk
Therefore, those who answer items incorrectly generally agree to policymaking but are mostly neutral, and a small group disagrees to only revising and updating.
10 False 4 Agree to Policymaking,5 Agree to Revising and Updating,3 Agree to dissemination of GK 373 373 10 False 4 Disagree to Policymaking,5 Disagree to Revising and Updating,3 Disagree to dissemination of GK 37 37 10 False 4 Agree to Policymaking,5 Neutral towards to Revising and Updating,3 Agree to dissemination of GK 34 34
10 True 4 Agree to Policymaking,5 Agree to Revising and Updating,3 Agree to dissemination of GK 123 123 10 True 4 Disagree to Policymaking,5 Disagree to Revising and Updating,3 Disagree to dissemination of GK 8 8
-Most agree to policy making, but a significant proportion also disagrees to policy making
- Most, both agree and disagree to revising and updating
- Most, both agree and disagree to dissemination of gk
- There is more divergence in opinion of those who answered the item incorrectly.There are more participants who answered the item incorrectly
- Those who answered the item correctly, either agreed to all or disagreed to all. They showed all or none effect.
Top 3 paths for categories of curiosity with opinion on likert and newborn screening
0 Low curiosity 1 Two legal guardians need to agree,2 Agree to dissemination of GK,3 Agree to Policymaking,4 Agree to Revising and Updating 111 111 0 Low curiosity 1 Prohibited until child has legal capacity,2 Agree to dissemination of GK,3 Agree to Policymaking,4 Agree to Revising and Updating 71 71 0 Low curiosity 1 Do not know,2 Agree to dissemination of GK,3 Agree to Policymaking,4 Agree to Revising and Updating 18 18
0 Medium curiosity 1 Two legal guardians need to agree,2 Agree to dissemination of GK,3 Agree to Policymaking,4 Agree to Revising and Updating 83 83 0 Medium curiosity 1 Prohibited until child has legal capacity,2 Agree to dissemination of GK,3 Agree to Policymaking,4 Agree to Revising and Updating 33 33 0 Medium curiosity 1 One legal guardian sufficient,2 Agree to dissemination of GK,3 Agree to Policymaking,4 Agree to Revising and Updating 11 11
0 High curiosity 1 Two legal guardians need to agree,2 Agree to dissemination of GK,3 Agree to Policymaking,4 Agree to Revising and Updating 69 69 0 High curiosity 1 Prohibited until child has legal capacity,2 Agree to dissemination of GK,3 Agree to Policymaking,4 Agree to Revising and Updating 36 36 0 High curiosity 1 Do not know,2 Agree to dissemination of GK,3 Agree to Policymaking,4 Agree to Revising and Updating 12 12
- Medium curiosity paticipant had more shared opinions on new born screening.
**Top paths for decision for newborn screening with opinions**
1 Two legal guardians need to agree 3 Agree to dissemination of GK,4 Agree to Policymaking,5 Agree to Revising and Updating 263 263
1 Prohibited until child has legal capacity 3 Agree to dissemination of GK,4 Agree to Policymaking,5 Agree to Revising and Updating 140 140
1 Do not know 3 Agree to dissemination of GK,4 Agree to Policymaking,5 Agree to Revising and Updating 39 39
1 One legal guardian sufficient 3 Agree to dissemination of GK,4 Agree to Policymaking,5 Agree to Revising and Updating 25 25
1 Medical facilities 3 Agree to dissemination of GK,4 Agree to Policymaking,5 Agree to Revising and Updating 16 16
1 Other 3 Agree to dissemination of GK,4 Agree to Policymaking,5 Agree to Revising and Updating 10 10
Sankey + flow divisions Two legal guardians need to agree, total = 404 ==> 404/(615+158) = 0.5226 flow from incorrect participants = 319 flow from correct participants = 85 Total incorrect participants = 615 => 319/615 = 0.5187 Total correct participants = 158 ==> 85/158 = 0.5379
Therefore, regardless of answering the item correctly or incorrectly, opinion on newborn decision making remains the same.
Example 2:
- All top paths suggested universal agreement for opinions on three likert items
Conclusion
attempt to describe one of the network very well... describe exactly how the lines flow,
more aware of privacy
[check] labels should be exact
[work in progress] text needs to followed easily... extra go to appendices thesis///
- outline according to the iglas paper
opinions -
paper///
concerns and curiosity -
[[[What are the biggest concerns of the curious people?]]] what would people want to know, why do people differ in how much they want to know, whether they want to know or not
[[[What people want to know? How much they differ (across different categories)?]]]
[[check]] are people more curious about medical stuff only
[[check]] are they more conservative or liberal
[[check]] describe very well... statistics.... decompose by group... by clever clustering... and it should work
[[check]] ONLY in light of curiosity
another about concern
Only key findings
/// it is difficult to understand why people are selecting something /// poeple with greater or lesser knowledge
[[[ FOR THIS PART To verify this first I need to see exactly which group is contributing how much to agreeing or disagreeing to something. First I combined all strongly agreeing or disagreeing levels to check if there is a persisting pattern in the dataset. The network plot reveals in this case the following pattern: Then I looked at the bar chart with all groupings to see if I can observe the magnitude of difference as well. In this case, Finally, I selected the participants for the two opinion groups and did a test. ]]]
Abstract write
[[[[
What are the opinions about science, genetic science? How do they change with whether participants had genetic testing done or not? How are they related with the different categories from participant having had genetic testing?
Key findings from themes:
Law participants group with high curious participants on some items but not on others.
Overall it seems there is more disagreement towards dissemination of GK, but network analysis shows that in fact, there participants are more disagreeing towards policymaking. (theme+thesis syntesis)
import folium
from folium.plugins import MarkerCluster
# sklearn imports
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.cluster import AffinityPropagation
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.decomposition import NMF
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore",category=DeprecationWarning)
# sns plots
from collections import Counter
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# network visualization
from itertools import combinations
import networkx as nx
# imports for survey analyses
import numpy as np # linear algebra
import seaborn as sns
color = sns.color_palette()
%matplotlib inline
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls
pd.options.mode.chained_assignment = None
pd.options.display.max_columns = 999
import datetime
import pytz
import ast
# detailed summaries
import sweetviz as sv
# region wise analysis imports
mcr_df = pd.read_csv("/home/manu10/Downloads/iglas_work/iGLAS-LE For Mrinal - value names.csv", low_memory=False)
metadata = pd.read_csv("/home/manu10/Downloads/iglas_work/metadata.csv", sep=':', low_memory=False)
col_dg = ['Progress', 'UserLanguage', 'Collection']
col_all = metadata['Variable']
# annotation dataframe
mcr_df_ann = mcr_df[mcr_df.columns & col_dg]
# all variable
mcr_df_all = mcr_df[mcr_df.columns & col_all]
# all annotated
mcr_df_all_n = pd.concat([mcr_df_ann, mcr_df_all], axis=1)
mcr_df_all_n["id"] = mcr_df_all_n.index
ncol_dg = ['id', 'Progress', 'UserLanguage', 'Collection']
dfx = pd.melt(mcr_df_all_n, id_vars=list(ncol_dg))
dfx["Variable"] = dfx["variable"]
del dfx["variable"]
t_all = pd.merge(dfx, metadata, on='Variable')
values_t_all = t_all
mcr_df = pd.read_csv("/home/manu10/Downloads/iglas_work/iGLAS-LE.csv", low_memory=False)
# all variable
mcr_df_all = mcr_df[mcr_df.columns & col_all]
# all annotated
mcr_df_all_n = pd.concat([mcr_df_ann, mcr_df_all], axis=1)
mcr_df_all_n["id"] = mcr_df_all_n.index
ncol_dg = ['id', 'Progress', 'UserLanguage', 'Collection']
dfx = pd.melt(mcr_df_all_n, id_vars=list(ncol_dg))
dfx["Variable"] = dfx["variable"]
del dfx["variable"]
dfx = dfx[dfx['UserLanguage'] == 'RU']
t_all = pd.merge(dfx, metadata, on='Variable')
codes_t_all = t_all
# all composites
gr_df = codes_t_all
comp_df = gr_df
#filter all empty strings from values
comp_df["value"] = comp_df["value"].map(str)
filter = comp_df["value"] != ' '
ndf = comp_df[filter]
filter = ndf["Composite"] == 'Yes'
new_df = ndf[filter]
new_df['Group'].unique()
# delete datframe rows that do not contain a number string
new_df = new_df[new_df['value'].apply(lambda x: str(x).isdigit())]
new_df['value'].unique()
ndf = new_df
###########################################333333333333333333333333333 filter for RU participants
ndf = ndf[ndf['UserLanguage']=='RU']
### 37
filter = ndf["Group"] == 37
ndf_37 = ndf[filter]
ndf_37['value'].replace('1','Not applicable',inplace=True)
ndf_37['value'].replace('2','Charity sector',inplace=True)
ndf_37['value'].replace('3','Construction and maintenance',inplace=True)
ndf_37['value'].replace('4','Education',inplace=True)
ndf_37['value'].replace('5','Engineering Computing and ICT',inplace=True)
ndf_37['value'].replace('6','Communication Advertising and Marketing',inplace=True)
ndf_37['value'].replace('7','Farming and agricultural',inplace=True)
ndf_37['value'].replace('17','Genetics',inplace=True)
ndf_37['value'].replace('8','Governmental employee',inplace=True)
ndf_37['value'].replace('9','Housing and accommodation',inplace=True)
ndf_37['value'].replace('10','Law',inplace=True)
ndf_37['value'].replace('11','Management',inplace=True)
ndf_37['value'].replace('12','Medicine',inplace=True)
ndf_37['value'].replace('13','Retired',inplace=True)
ndf_37['value'].replace('14','Sales and office work',inplace=True)
ndf_37['value'].replace('15','Science and research',inplace=True)
ndf_37['value'].replace('16','Other',inplace=True)
#### 23
filter = ndf["Group"] == 23
ndf_23 = ndf[filter]
ndf_23['Option'] = ndf_23['Option']
### 25
filter = ndf["Group"] == 25
ndf_25 = ndf[filter]
ndf_25['value'].replace('1','Yes1',inplace=True)
ndf_25['value'].replace('2','No1',inplace=True)
ndf_25['value'].replace('3','Do not know1',inplace=True)
### 20
filter = ndf["Group"] == 20
ndf_20 = ndf[filter]
ndf_20['value'].replace('1','General Research',inplace=True)
ndf_20['value'].replace('2','Agronomist',inplace=True)
ndf_20['value'].replace('3','Counselling',inplace=True)
ndf_20['value'].replace('4','Law',inplace=True)
ndf_20['value'].replace('5','Medical',inplace=True)
ndf_20['value'].replace('6','Behavioural',inplace=True)
ndf_20['value'].replace('7','Educational',inplace=True)
ndf_20['value'].replace('8','Other',inplace=True)
### 2
filter = ndf["Group"] == 2
ndf_2 = ndf[filter]
ndf_2['value'].replace('1','Yes2',inplace=True)
ndf_2['value'].replace('2','No2',inplace=True)
ndf_2['value'].replace('3','Do not know2',inplace=True)
### 4
filter = ndf["Group"] == 4
ndf_4 = ndf[filter]
ndf_4['value'].replace('1','Yes3',inplace=True)
ndf_4['value'].replace('2','No3',inplace=True)
ndf_4['value'].replace('3','Do not know3',inplace=True)
### 5
filter = ndf["Group"] == 5
ndf_5 = ndf[filter]
ndf_5['value'].replace('1','Yes4',inplace=True)
ndf_5['value'].replace('2','No4',inplace=True)
ndf_5['value'].replace('3','Do not know4',inplace=True)
### 7
filter = ndf["Group"] == 7
ndf_7 = ndf[filter]
ndf_7['value'].replace('1','Yes5',inplace=True)
ndf_7['value'].replace('2','No5',inplace=True)
ndf_7['value'].replace('3','Do not know5',inplace=True)
### 10
filter = ndf["Group"] == 10
ndf_10 = ndf[filter]
ndf_10['value'].replace('1','Increases the risk of discrimination',inplace=True)
ndf_10['value'].replace('2','Reduces the risk of discrimination',inplace=True)
ndf_10['value'].replace('3','Makes no difference to the risk of discrimination',inplace=True)
ndf_10['value'].replace('4','Both increases and decreases the risk of discrimination depending on how genetics data are regulated',inplace=True)
ndf_10['value'].replace('5','Do not know6',inplace=True)
### 11
filter = ndf["Group"] == 11
ndf_11 = ndf[filter]
ndf_11['value'].replace('1','Benefits outweigh the risks',inplace=True)
ndf_11['value'].replace('2','Risk outweighs the benefits',inplace=True)
ndf_11['value'].replace('3','Risks and benefits balance each-other out',inplace=True)
ndf_11['value'].replace('4','Do not know7',inplace=True)
### 12
filter = ndf["Group"] == 12
ndf_12 = ndf[filter]
ndf_12['value'].replace('1','Always voluntary',inplace=True)
ndf_12['value'].replace('2','Compulsory but under certain circumstances',inplace=True)
ndf_12['value'].replace('3','Do not know8',inplace=True)
### 14
filter = ndf["Group"] == 14
ndf_14 = ndf[filter]
ndf_14['value'].replace('1','Yes9',inplace=True)
ndf_14['value'].replace('2','No9',inplace=True)
ndf_14['value'].replace('3','Not applicable9',inplace=True)
ndf_14['value'].replace('4','Do not know9',inplace=True)
### 15
filter = ndf["Group"] == 15
ndf_15 = ndf[filter]
ndf_15['value'].replace('1','Sufficient10',inplace=True)
ndf_15['value'].replace('2','Insufficient10',inplace=True)
ndf_15['value'].replace('3','Do not know10',inplace=True)
### 33
filter = ndf["Group"] == 33
ndf_33 = ndf[filter]
ndf_33['value'].replace('1','Pre-GCSE school leavers certificates',inplace=True)
ndf_33['value'].replace('2','GCSE or equivalent school leavers certificates',inplace=True)
ndf_33['value'].replace('3','A-level or equivalent',inplace=True)
ndf_33['value'].replace('4','Undergraduate',inplace=True)
ndf_33['value'].replace('5','Of Master',inplace=True)
ndf_33['value'].replace('6','Doctoral degree',inplace=True)
ndf_33['value'].replace('7','Post-doctoral qualification',inplace=True)
### 34
filter = ndf["Group"] == 34
ndf_34 = ndf[filter]
ndf_34['value'].replace('1','Yes11',inplace=True)
ndf_34['value'].replace('2','No11',inplace=True)
### 35
filter = ndf["Group"] == 35
ndf_35 = ndf[filter]
ndf_35['value'].replace('1','Art and Design',inplace=True)
ndf_35['value'].replace('2','Ancient History and Archaeology',inplace=True)
ndf_35['value'].replace('3','Biology',inplace=True)
ndf_35['value'].replace('4','Chemistry',inplace=True)
ndf_35['value'].replace('5','Classics',inplace=True)
ndf_35['value'].replace('6','Communication Advertising and Marketing',inplace=True)
ndf_35['value'].replace('7','Economics and Business Studies',inplace=True)
ndf_35['value'].replace('8','Education',inplace=True)
ndf_35['value'].replace('9','Electronics Engineering Computing and ICT',inplace=True)
ndf_35['value'].replace('10','English',inplace=True)
ndf_35['value'].replace('11','Environmental Sciences',inplace=True)
ndf_35['value'].replace('12','Genetics',inplace=True)
ndf_35['value'].replace('13','Geology',inplace=True)
ndf_35['value'].replace('14','Geography',inplace=True)
ndf_35['value'].replace('15','Government and Politics',inplace=True)
ndf_35['value'].replace('16','Health and Social Care',inplace=True)
ndf_35['value'].replace('17','History',inplace=True)
ndf_35['value'].replace('18','Languages',inplace=True)
ndf_35['value'].replace('19','Law',inplace=True)
ndf_35['value'].replace('20','Mathematics',inplace=True)
ndf_35['value'].replace('21','Media Studies',inplace=True)
ndf_35['value'].replace('22','Medicine',inplace=True)
ndf_35['value'].replace('23','Music',inplace=True)
ndf_35['value'].replace('24','Performance and Theatrical Arts',inplace=True)
ndf_35['value'].replace('25','Philosophy Religion and Ethics',inplace=True)
ndf_35['value'].replace('26','Physics',inplace=True)
ndf_35['value'].replace('27','Psychology',inplace=True)
ndf_35['value'].replace('28','Sociology',inplace=True)
ndf_35['value'].replace('29','Sports and Exercise Science',inplace=True)
ndf_35['value'].replace('30','Statistics and research methods',inplace=True)
ndf_35['value'].replace('31','Travel and Tourism',inplace=True)
ndf_35['value'].replace('32','Other',inplace=True)
### 39
filter = ndf["Group"] == 39
ndf_39 = ndf[filter]
ndf_39['Composite'] = 'Text'
### 44
filter = ndf["Group"] == 44
ndf_44 = ndf[filter]
ndf_44['value'].replace('1','Primary school',inplace=True)
ndf_44['value'].replace('2','Secondary school',inplace=True)
ndf_44['value'].replace('3','University',inplace=True)
### 46
filter = ndf["Group"] == 46
ndf_46 = ndf[filter]
ndf_46['value'].replace('1','Less than 1 year',inplace=True)
ndf_46['value'].replace('2','1 to 4 years',inplace=True)
ndf_46['value'].replace('3','5 to 10 years',inplace=True)
ndf_46['value'].replace('4','11 to 20 years',inplace=True)
ndf_46['value'].replace('5','21 or more years',inplace=True)
### 47
filter = ndf["Group"] == 47
ndf_47 = ndf[filter]
ndf_47['Option'] = ndf_47['value']
### 48
filter = ndf["Group"] == 48
ndf_48 = ndf[filter]
ndf_48['value'].replace('1','Teacher',inplace=True)
ndf_48['value'].replace('2','Head teacher',inplace=True)
ndf_48['value'].replace('3','Teaching assistant',inplace=True)
ndf_48['value'].replace('4','Office and admin',inplace=True)
### 49
filter = ndf["Group"] == 49
ndf_49 = ndf[filter]
ndf_49['value'].replace('1','English',inplace=True)
ndf_49['value'].replace('2','Maths',inplace=True)
ndf_49['value'].replace('3','Science',inplace=True)
ndf_49['value'].replace('4','Languages',inplace=True)
ndf_49['value'].replace('5','History',inplace=True)
ndf_49['value'].replace('6','Geography',inplace=True)
ndf_49['value'].replace('7','Physical Education',inplace=True)
ndf_49['value'].replace('8','Art and Design',inplace=True)
ndf_49['value'].replace('9','Music',inplace=True)
ndf_49['value'].replace('10','ICT',inplace=True)
ndf_49['value'].replace('11','Drama',inplace=True)
ndf_49['value'].replace('12','Other',inplace=True)
### 51
filter = ndf["Group"] == 51
ndf_51 = ndf[filter]
ndf_51['value'].replace('1','Academic - Lecturer',inplace=True)
ndf_51['value'].replace('2','Non-Academic - Administration',inplace=True)
### 52
filter = ndf["Group"] == 52
ndf_52 = ndf[filter]
ndf_52['value'].replace('1','Art and Design',inplace=True)
ndf_52['value'].replace('2','Ancient History and Archaeology',inplace=True)
ndf_52['value'].replace('3','Biology',inplace=True)
ndf_52['value'].replace('4','Chemistry',inplace=True)
ndf_52['value'].replace('5','Classics',inplace=True)
ndf_52['value'].replace('6','Communication Advertising and Marketing',inplace=True)
ndf_52['value'].replace('7','Economics and Business Studies',inplace=True)
ndf_52['value'].replace('8','Education',inplace=True)
ndf_52['value'].replace('9','Electronics Engineering Computing and ICT',inplace=True)
ndf_52['value'].replace('10','English',inplace=True)
ndf_52['value'].replace('11','Environmental Sciences',inplace=True)
ndf_52['value'].replace('12','Genetics',inplace=True)
ndf_52['value'].replace('13','Geology',inplace=True)
ndf_52['value'].replace('14','Geography',inplace=True)
ndf_52['value'].replace('15','Government and Politics',inplace=True)
ndf_52['value'].replace('16','Health and Social Care',inplace=True)
ndf_52['value'].replace('17','History',inplace=True)
ndf_52['value'].replace('18','Languages',inplace=True)
ndf_52['value'].replace('19','Law',inplace=True)
ndf_52['value'].replace('20','Mathematics',inplace=True)
ndf_52['value'].replace('21','Media Studies',inplace=True)
ndf_52['value'].replace('22','Medicine',inplace=True)
ndf_52['value'].replace('23','Music',inplace=True)
ndf_52['value'].replace('24','Performance and Theatrical Arts',inplace=True)
ndf_52['value'].replace('25','Philosophy Religion and Ethics',inplace=True)
ndf_52['value'].replace('26','Physics',inplace=True)
ndf_52['value'].replace('27','Psychology',inplace=True)
ndf_52['value'].replace('28','Sociology',inplace=True)
ndf_52['value'].replace('29','Sports and Exercise Science',inplace=True)
ndf_52['value'].replace('30','Statistics and research methods',inplace=True)
ndf_52['value'].replace('31','Travel and Tourism',inplace=True)
ndf_52['value'].replace('32','Other',inplace=True)
### 55
filter = ndf["Group"] == 55
ndf_55 = ndf[filter]
ndf_55['value'].replace('1','Afghanistan',inplace=True)
ndf_55['value'].replace('2','Albania',inplace=True)
ndf_55['value'].replace('3','Algeria',inplace=True)
ndf_55['value'].replace('4','Andorra',inplace=True)
ndf_55['value'].replace('5','Angola',inplace=True)
ndf_55['value'].replace('6','Antigua and Barbuda',inplace=True)
ndf_55['value'].replace('7','Argentina',inplace=True)
ndf_55['value'].replace('8','Armenia',inplace=True)
ndf_55['value'].replace('9','Australia',inplace=True)
ndf_55['value'].replace('10','Austria',inplace=True)
ndf_55['value'].replace('11','Azerbaijan',inplace=True)
ndf_55['value'].replace('12','Bahamas',inplace=True)
ndf_55['value'].replace('13','Bahrain',inplace=True)
ndf_55['value'].replace('14','Bangladesh',inplace=True)
ndf_55['value'].replace('15','Barbados',inplace=True)
ndf_55['value'].replace('16','Belarus',inplace=True)
ndf_55['value'].replace('17','Belgium',inplace=True)
ndf_55['value'].replace('18','Belize',inplace=True)
ndf_55['value'].replace('19','Benin',inplace=True)
ndf_55['value'].replace('20','Bhutan',inplace=True)
ndf_55['value'].replace('21','Bolivia',inplace=True)
ndf_55['value'].replace('22','Bosnia and Herzegovina',inplace=True)
ndf_55['value'].replace('23','Botswana',inplace=True)
ndf_55['value'].replace('24','Brazil',inplace=True)
ndf_55['value'].replace('25','Brunei',inplace=True)
ndf_55['value'].replace('26','Bulgaria',inplace=True)
ndf_55['value'].replace('27','Burkina Faso',inplace=True)
ndf_55['value'].replace('28','Burundi',inplace=True)
ndf_55['value'].replace('29','Cabo Verde',inplace=True)
ndf_55['value'].replace('30','Cambodia',inplace=True)
ndf_55['value'].replace('31','Cameroon',inplace=True)
ndf_55['value'].replace('32','Canada',inplace=True)
ndf_55['value'].replace('33','Central African Republic',inplace=True)
ndf_55['value'].replace('34','Chad',inplace=True)
ndf_55['value'].replace('35','Chile',inplace=True)
ndf_55['value'].replace('36','People s Republic of China',inplace=True)
ndf_55['value'].replace('37','Colombia',inplace=True)
ndf_55['value'].replace('38','Comoros',inplace=True)
ndf_55['value'].replace('39','Congo, Republic of the',inplace=True)
ndf_55['value'].replace('40','Costa Rica',inplace=True)
ndf_55['value'].replace('41','Cote de Ivoire',inplace=True)
ndf_55['value'].replace('42','Croatia',inplace=True)
ndf_55['value'].replace('43','Cuba',inplace=True)
ndf_55['value'].replace('44','Curacao',inplace=True)
ndf_55['value'].replace('45','Cyprus',inplace=True)
ndf_55['value'].replace('46','Czech Republic',inplace=True)
ndf_55['value'].replace('47','Democratic Republic of congo',inplace=True)
ndf_55['value'].replace('48','Denmark',inplace=True)
ndf_55['value'].replace('49','Djibouti',inplace=True)
ndf_55['value'].replace('50','Dominica',inplace=True)
ndf_55['value'].replace('51','Dominican Republic',inplace=True)
ndf_55['value'].replace('52','Ecuador',inplace=True)
ndf_55['value'].replace('53','Egypt',inplace=True)
ndf_55['value'].replace('54','El Salvador',inplace=True)
ndf_55['value'].replace('55','Equatorial Guinea',inplace=True)
ndf_55['value'].replace('56','Eritrea',inplace=True)
ndf_55['value'].replace('57','Estonia',inplace=True)
ndf_55['value'].replace('58','Ethiopia',inplace=True)
ndf_55['value'].replace('59','Fiji',inplace=True)
ndf_55['value'].replace('60','Finland',inplace=True)
ndf_55['value'].replace('61','France',inplace=True)
ndf_55['value'].replace('62','Gabon',inplace=True)
ndf_55['value'].replace('63','Gambia, The',inplace=True)
ndf_55['value'].replace('64','Georgia',inplace=True)
ndf_55['value'].replace('65','Germany',inplace=True)
ndf_55['value'].replace('66','Ghana',inplace=True)
ndf_55['value'].replace('67','Greece',inplace=True)
ndf_55['value'].replace('68','Grenada',inplace=True)
ndf_55['value'].replace('69','Guatemala',inplace=True)
ndf_55['value'].replace('70','Guinea',inplace=True)
ndf_55['value'].replace('71','Guinea-Bissau',inplace=True)
ndf_55['value'].replace('72','Guyana',inplace=True)
ndf_55['value'].replace('73','Haiti',inplace=True)
ndf_55['value'].replace('74','Honduras',inplace=True)
ndf_55['value'].replace('75','Hong Kong',inplace=True)
ndf_55['value'].replace('76','Hungary',inplace=True)
ndf_55['value'].replace('77','Iceland',inplace=True)
ndf_55['value'].replace('78','India',inplace=True)
ndf_55['value'].replace('79','Indonesia',inplace=True)
ndf_55['value'].replace('80','Iran',inplace=True)
ndf_55['value'].replace('81','Iraq',inplace=True)
ndf_55['value'].replace('82','Ireland',inplace=True)
ndf_55['value'].replace('83','Israel',inplace=True)
ndf_55['value'].replace('84','Italy',inplace=True)
ndf_55['value'].replace('85','Jamaica',inplace=True)
ndf_55['value'].replace('86','Japan',inplace=True)
ndf_55['value'].replace('87','Jordan',inplace=True)
ndf_55['value'].replace('88','Kazakhstan',inplace=True)
ndf_55['value'].replace('89','Kenya',inplace=True)
ndf_55['value'].replace('90','Kiribati',inplace=True)
ndf_55['value'].replace('91','Kuwait',inplace=True)
ndf_55['value'].replace('92','Kyrgyzstan',inplace=True)
ndf_55['value'].replace('93','Laos',inplace=True)
ndf_55['value'].replace('94','Latvia',inplace=True)
ndf_55['value'].replace('95','Lebanon',inplace=True)
ndf_55['value'].replace('96','Lesotho',inplace=True)
ndf_55['value'].replace('97','Liberia',inplace=True)
ndf_55['value'].replace('98','Libya',inplace=True)
ndf_55['value'].replace('99','Liechtenstein',inplace=True)
ndf_55['value'].replace('100','Lithuania',inplace=True)
ndf_55['value'].replace('101','Luxembourg',inplace=True)
ndf_55['value'].replace('102','Madagascar',inplace=True)
ndf_55['value'].replace('103','Malawi',inplace=True)
ndf_55['value'].replace('104','Malaysia',inplace=True)
ndf_55['value'].replace('105','Maldives',inplace=True)
ndf_55['value'].replace('106','Mali',inplace=True)
ndf_55['value'].replace('107','Malta',inplace=True)
ndf_55['value'].replace('108','Marshall Islands',inplace=True)
ndf_55['value'].replace('109','Mauritania',inplace=True)
ndf_55['value'].replace('110','Mauritius',inplace=True)
ndf_55['value'].replace('111','Mexico',inplace=True)
ndf_55['value'].replace('112','Micronesia, Federated States of',inplace=True)
ndf_55['value'].replace('113','Monaco',inplace=True)
ndf_55['value'].replace('114','Mongolia',inplace=True)
ndf_55['value'].replace('115','Montenegro',inplace=True)
ndf_55['value'].replace('116','Morocco',inplace=True)
ndf_55['value'].replace('117','Mozambique',inplace=True)
ndf_55['value'].replace('118','Myanmar',inplace=True)
ndf_55['value'].replace('119','Namibia',inplace=True)
ndf_55['value'].replace('120','Nauru',inplace=True)
ndf_55['value'].replace('121','Nepal',inplace=True)
ndf_55['value'].replace('122','Netherlands',inplace=True)
ndf_55['value'].replace('123','New Zealand',inplace=True)
ndf_55['value'].replace('124','Nicaragua',inplace=True)
ndf_55['value'].replace('125','Niger',inplace=True)
ndf_55['value'].replace('126','Nigeria',inplace=True)
ndf_55['value'].replace('127','Norway',inplace=True)
ndf_55['value'].replace('128','Oman',inplace=True)
ndf_55['value'].replace('129','Pakistan',inplace=True)
ndf_55['value'].replace('130','Palau',inplace=True)
ndf_55['value'].replace('131','Panama',inplace=True)
ndf_55['value'].replace('132','Papua New Guinea',inplace=True)
ndf_55['value'].replace('133','Paraguay',inplace=True)
ndf_55['value'].replace('134','Peru',inplace=True)
ndf_55['value'].replace('135','Philippines',inplace=True)
ndf_55['value'].replace('136','Poland',inplace=True)
ndf_55['value'].replace('137','Portugal',inplace=True)
ndf_55['value'].replace('138','Qatar',inplace=True)
ndf_55['value'].replace('139','Republic of Korea',inplace=True)
ndf_55['value'].replace('140','Republic of Moldova',inplace=True)
ndf_55['value'].replace('141','Romania',inplace=True)
ndf_55['value'].replace('142','Russia',inplace=True)
ndf_55['value'].replace('143','Rwanda',inplace=True)
ndf_55['value'].replace('144','Saint Kitts and Nevis',inplace=True)
ndf_55['value'].replace('145','Saint Lucia',inplace=True)
ndf_55['value'].replace('146','Saint Vincent and the Grenadines',inplace=True)
ndf_55['value'].replace('147','Samoa',inplace=True)
ndf_55['value'].replace('148','San Marino',inplace=True)
ndf_55['value'].replace('149','Sao Tome and Principe',inplace=True)
ndf_55['value'].replace('150','Saudi Arabia',inplace=True)
ndf_55['value'].replace('151','Senegal',inplace=True)
ndf_55['value'].replace('152','Serbia',inplace=True)
ndf_55['value'].replace('153','Seychelles',inplace=True)
ndf_55['value'].replace('154','Sierra Leone',inplace=True)
ndf_55['value'].replace('155','Singapore',inplace=True)
ndf_55['value'].replace('156','Slovakia',inplace=True)
ndf_55['value'].replace('157','Slovenia',inplace=True)
ndf_55['value'].replace('158','Solomon Islands',inplace=True)
ndf_55['value'].replace('159','Somalia',inplace=True)
ndf_55['value'].replace('160','South Africa',inplace=True)
ndf_55['value'].replace('161','Spain',inplace=True)
ndf_55['value'].replace('162','Sri Lanka',inplace=True)
ndf_55['value'].replace('163','Sudan',inplace=True)
ndf_55['value'].replace('164','Suriname',inplace=True)
ndf_55['value'].replace('165','Swaziland',inplace=True)
ndf_55['value'].replace('166','Sweden',inplace=True)
ndf_55['value'].replace('167','Switzerland',inplace=True)
ndf_55['value'].replace('168','Syria',inplace=True)
ndf_55['value'].replace('169','Tajikistan',inplace=True)
ndf_55['value'].replace('170','Thailand',inplace=True)
ndf_55['value'].replace('171','Macedonia',inplace=True)
ndf_55['value'].replace('172','Timor-Leste',inplace=True)
ndf_55['value'].replace('173','Togo',inplace=True)
ndf_55['value'].replace('174','Tonga',inplace=True)
ndf_55['value'].replace('175','Trinidad and Tobago',inplace=True)
ndf_55['value'].replace('176','Tunisia',inplace=True)
ndf_55['value'].replace('177','Turkey',inplace=True)
ndf_55['value'].replace('178','Turkmenistan',inplace=True)
ndf_55['value'].replace('179','Tuvalu',inplace=True)
ndf_55['value'].replace('180','Uganda',inplace=True)
ndf_55['value'].replace('181','Ukraine',inplace=True)
ndf_55['value'].replace('182','United Arab Emirates',inplace=True)
ndf_55['value'].replace('183','United Kingdom',inplace=True)
ndf_55['value'].replace('184','Tanzania',inplace=True)
ndf_55['value'].replace('185','United States',inplace=True)
ndf_55['value'].replace('186','Uruguay',inplace=True)
ndf_55['value'].replace('187','Uzbekistan',inplace=True)
ndf_55['value'].replace('188','Vanuatu',inplace=True)
ndf_55['value'].replace('189','Venezuela',inplace=True)
ndf_55['value'].replace('190','Vietnam',inplace=True)
ndf_55['value'].replace('191','Yemen',inplace=True)
ndf_55['value'].replace('192','Zambia',inplace=True)
ndf_55['value'].replace('193','Zimbabwe',inplace=True)
### 56
filter = ndf["Group"] == 56
ndf_56 = ndf[filter]
ndf_56['value'].replace('1','Afghanistan',inplace=True)
ndf_56['value'].replace('2','Albania',inplace=True)
ndf_56['value'].replace('3','Algeria',inplace=True)
ndf_56['value'].replace('4','Andorra',inplace=True)
ndf_56['value'].replace('5','Angola',inplace=True)
ndf_56['value'].replace('6','Antigua and Barbuda',inplace=True)
ndf_56['value'].replace('7','Argentina',inplace=True)
ndf_56['value'].replace('8','Armenia',inplace=True)
ndf_56['value'].replace('9','Australia',inplace=True)
ndf_56['value'].replace('10','Austria',inplace=True)
ndf_56['value'].replace('11','Azerbaijan',inplace=True)
ndf_56['value'].replace('12','Bahamas',inplace=True)
ndf_56['value'].replace('13','Bahrain',inplace=True)
ndf_56['value'].replace('14','Bangladesh',inplace=True)
ndf_56['value'].replace('15','Barbados',inplace=True)
ndf_56['value'].replace('16','Belarus',inplace=True)
ndf_56['value'].replace('17','Belgium',inplace=True)
ndf_56['value'].replace('18','Belize',inplace=True)
ndf_56['value'].replace('19','Benin',inplace=True)
ndf_56['value'].replace('20','Bhutan',inplace=True)
ndf_56['value'].replace('21','Bolivia',inplace=True)
ndf_56['value'].replace('22','Bosnia and Herzegovina',inplace=True)
ndf_56['value'].replace('23','Botswana',inplace=True)
ndf_56['value'].replace('24','Brazil',inplace=True)
ndf_56['value'].replace('25','Brunei',inplace=True)
ndf_56['value'].replace('26','Bulgaria',inplace=True)
ndf_56['value'].replace('27','Burkina Faso',inplace=True)
ndf_56['value'].replace('28','Burundi',inplace=True)
ndf_56['value'].replace('29','Cabo Verde',inplace=True)
ndf_56['value'].replace('30','Cambodia',inplace=True)
ndf_56['value'].replace('31','Cameroon',inplace=True)
ndf_56['value'].replace('32','Canada',inplace=True)
ndf_56['value'].replace('33','Central African Republic',inplace=True)
ndf_56['value'].replace('34','Chad',inplace=True)
ndf_56['value'].replace('35','Chile',inplace=True)
ndf_56['value'].replace('36','People s Republic of China',inplace=True)
ndf_56['value'].replace('37','Colombia',inplace=True)
ndf_56['value'].replace('38','Comoros',inplace=True)
ndf_56['value'].replace('39','Congo, Republic of the',inplace=True)
ndf_56['value'].replace('40','Costa Rica',inplace=True)
ndf_56['value'].replace('41','Cote de Ivoire',inplace=True)
ndf_56['value'].replace('42','Croatia',inplace=True)
ndf_56['value'].replace('43','Cuba',inplace=True)
ndf_56['value'].replace('44','Curacao',inplace=True)
ndf_56['value'].replace('45','Cyprus',inplace=True)
ndf_56['value'].replace('46','Czech Republic',inplace=True)
ndf_56['value'].replace('47','Democratic Republic of congo',inplace=True)
ndf_56['value'].replace('48','Denmark',inplace=True)
ndf_56['value'].replace('49','Djibouti',inplace=True)
ndf_56['value'].replace('50','Dominica',inplace=True)
ndf_56['value'].replace('51','Dominican Republic',inplace=True)
ndf_56['value'].replace('52','Ecuador',inplace=True)
ndf_56['value'].replace('53','Egypt',inplace=True)
ndf_56['value'].replace('54','El Salvador',inplace=True)
ndf_56['value'].replace('55','Equatorial Guinea',inplace=True)
ndf_56['value'].replace('56','Eritrea',inplace=True)
ndf_56['value'].replace('57','Estonia',inplace=True)
ndf_56['value'].replace('58','Ethiopia',inplace=True)
ndf_56['value'].replace('59','Fiji',inplace=True)
ndf_56['value'].replace('60','Finland',inplace=True)
ndf_56['value'].replace('61','France',inplace=True)
ndf_56['value'].replace('62','Gabon',inplace=True)
ndf_56['value'].replace('63','Gambia, The',inplace=True)
ndf_56['value'].replace('64','Georgia',inplace=True)
ndf_56['value'].replace('65','Germany',inplace=True)
ndf_56['value'].replace('66','Ghana',inplace=True)
ndf_56['value'].replace('67','Greece',inplace=True)
ndf_56['value'].replace('68','Grenada',inplace=True)
ndf_56['value'].replace('69','Guatemala',inplace=True)
ndf_56['value'].replace('70','Guinea',inplace=True)
ndf_56['value'].replace('71','Guinea-Bissau',inplace=True)
ndf_56['value'].replace('72','Guyana',inplace=True)
ndf_56['value'].replace('73','Haiti',inplace=True)
ndf_56['value'].replace('74','Honduras',inplace=True)
ndf_56['value'].replace('75','Hong Kong',inplace=True)
ndf_56['value'].replace('76','Hungary',inplace=True)
ndf_56['value'].replace('77','Iceland',inplace=True)
ndf_56['value'].replace('78','India',inplace=True)
ndf_56['value'].replace('79','Indonesia',inplace=True)
ndf_56['value'].replace('80','Iran',inplace=True)
ndf_56['value'].replace('81','Iraq',inplace=True)
ndf_56['value'].replace('82','Ireland',inplace=True)
ndf_56['value'].replace('83','Israel',inplace=True)
ndf_56['value'].replace('84','Italy',inplace=True)
ndf_56['value'].replace('85','Jamaica',inplace=True)
ndf_56['value'].replace('86','Japan',inplace=True)
ndf_56['value'].replace('87','Jordan',inplace=True)
ndf_56['value'].replace('88','Kazakhstan',inplace=True)
ndf_56['value'].replace('89','Kenya',inplace=True)
ndf_56['value'].replace('90','Kiribati',inplace=True)
ndf_56['value'].replace('91','Kuwait',inplace=True)
ndf_56['value'].replace('92','Kyrgyzstan',inplace=True)
ndf_56['value'].replace('93','Laos',inplace=True)
ndf_56['value'].replace('94','Latvia',inplace=True)
ndf_56['value'].replace('95','Lebanon',inplace=True)
ndf_56['value'].replace('96','Lesotho',inplace=True)
ndf_56['value'].replace('97','Liberia',inplace=True)
ndf_56['value'].replace('98','Libya',inplace=True)
ndf_56['value'].replace('99','Liechtenstein',inplace=True)
ndf_56['value'].replace('100','Lithuania',inplace=True)
ndf_56['value'].replace('101','Luxembourg',inplace=True)
ndf_56['value'].replace('102','Madagascar',inplace=True)
ndf_56['value'].replace('103','Malawi',inplace=True)
ndf_56['value'].replace('104','Malaysia',inplace=True)
ndf_56['value'].replace('105','Maldives',inplace=True)
ndf_56['value'].replace('106','Mali',inplace=True)
ndf_56['value'].replace('107','Malta',inplace=True)
ndf_56['value'].replace('108','Marshall Islands',inplace=True)
ndf_56['value'].replace('109','Mauritania',inplace=True)
ndf_56['value'].replace('110','Mauritius',inplace=True)
ndf_56['value'].replace('111','Mexico',inplace=True)
ndf_56['value'].replace('112','Micronesia, Federated States of',inplace=True)
ndf_56['value'].replace('113','Monaco',inplace=True)
ndf_56['value'].replace('114','Mongolia',inplace=True)
ndf_56['value'].replace('115','Montenegro',inplace=True)
ndf_56['value'].replace('116','Morocco',inplace=True)
ndf_56['value'].replace('117','Mozambique',inplace=True)
ndf_56['value'].replace('118','Myanmar',inplace=True)
ndf_56['value'].replace('119','Namibia',inplace=True)
ndf_56['value'].replace('120','Nauru',inplace=True)
ndf_56['value'].replace('121','Nepal',inplace=True)
ndf_56['value'].replace('122','Netherlands',inplace=True)
ndf_56['value'].replace('123','New Zealand',inplace=True)
ndf_56['value'].replace('124','Nicaragua',inplace=True)
ndf_56['value'].replace('125','Niger',inplace=True)
ndf_56['value'].replace('126','Nigeria',inplace=True)
ndf_56['value'].replace('127','Norway',inplace=True)
ndf_56['value'].replace('128','Oman',inplace=True)
ndf_56['value'].replace('129','Pakistan',inplace=True)
ndf_56['value'].replace('130','Palau',inplace=True)
ndf_56['value'].replace('131','Panama',inplace=True)
ndf_56['value'].replace('132','Papua New Guinea',inplace=True)
ndf_56['value'].replace('133','Paraguay',inplace=True)
ndf_56['value'].replace('134','Peru',inplace=True)
ndf_56['value'].replace('135','Philippines',inplace=True)
ndf_56['value'].replace('136','Poland',inplace=True)
ndf_56['value'].replace('137','Portugal',inplace=True)
ndf_56['value'].replace('138','Qatar',inplace=True)
ndf_56['value'].replace('139','Republic of Korea',inplace=True)
ndf_56['value'].replace('140','Republic of Moldova',inplace=True)
ndf_56['value'].replace('141','Romania',inplace=True)
ndf_56['value'].replace('142','Russia',inplace=True)
ndf_56['value'].replace('143','Rwanda',inplace=True)
ndf_56['value'].replace('144','Saint Kitts and Nevis',inplace=True)
ndf_56['value'].replace('145','Saint Lucia',inplace=True)
ndf_56['value'].replace('146','Saint Vincent and the Grenadines',inplace=True)
ndf_56['value'].replace('147','Samoa',inplace=True)
ndf_56['value'].replace('148','San Marino',inplace=True)
ndf_56['value'].replace('149','Sao Tome and Principe',inplace=True)
ndf_56['value'].replace('150','Saudi Arabia',inplace=True)
ndf_56['value'].replace('151','Senegal',inplace=True)
ndf_56['value'].replace('152','Serbia',inplace=True)
ndf_56['value'].replace('153','Seychelles',inplace=True)
ndf_56['value'].replace('154','Sierra Leone',inplace=True)
ndf_56['value'].replace('155','Singapore',inplace=True)
ndf_56['value'].replace('156','Slovakia',inplace=True)
ndf_56['value'].replace('157','Slovenia',inplace=True)
ndf_56['value'].replace('158','Solomon Islands',inplace=True)
ndf_56['value'].replace('159','Somalia',inplace=True)
ndf_56['value'].replace('160','South Africa',inplace=True)
ndf_56['value'].replace('161','Spain',inplace=True)
ndf_56['value'].replace('162','Sri Lanka',inplace=True)
ndf_56['value'].replace('163','Sudan',inplace=True)
ndf_56['value'].replace('164','Suriname',inplace=True)
ndf_56['value'].replace('165','Swaziland',inplace=True)
ndf_56['value'].replace('166','Sweden',inplace=True)
ndf_56['value'].replace('167','Switzerland',inplace=True)
ndf_56['value'].replace('168','Syria',inplace=True)
ndf_56['value'].replace('169','Tajikistan',inplace=True)
ndf_56['value'].replace('170','Thailand',inplace=True)
ndf_56['value'].replace('171','Macedonia',inplace=True)
ndf_56['value'].replace('172','Timor-Leste',inplace=True)
ndf_56['value'].replace('173','Togo',inplace=True)
ndf_56['value'].replace('174','Tonga',inplace=True)
ndf_56['value'].replace('175','Trinidad and Tobago',inplace=True)
ndf_56['value'].replace('176','Tunisia',inplace=True)
ndf_56['value'].replace('177','Turkey',inplace=True)
ndf_56['value'].replace('178','Turkmenistan',inplace=True)
ndf_56['value'].replace('179','Tuvalu',inplace=True)
ndf_56['value'].replace('180','Uganda',inplace=True)
ndf_56['value'].replace('181','Ukraine',inplace=True)
ndf_56['value'].replace('182','United Arab Emirates',inplace=True)
ndf_56['value'].replace('183','United Kingdom',inplace=True)
ndf_56['value'].replace('184','Tanzania',inplace=True)
ndf_56['value'].replace('185','United States',inplace=True)
ndf_56['value'].replace('186','Uruguay',inplace=True)
ndf_56['value'].replace('187','Uzbekistan',inplace=True)
ndf_56['value'].replace('188','Vanuatu',inplace=True)
ndf_56['value'].replace('189','Venezuela',inplace=True)
ndf_56['value'].replace('190','Vietnam',inplace=True)
ndf_56['value'].replace('191','Yemen',inplace=True)
ndf_56['value'].replace('192','Zambia',inplace=True)
ndf_56['value'].replace('193','Zimbabwe',inplace=True)
### 30
filter = ndf["Group"] == 30
ndf_30 = ndf[filter]
ndf_30['value'].replace('1','23andMe', inplace=True)
ndf_30['value'].replace('2','23mofang', inplace=True)
ndf_30['value'].replace('3','24 genetics', inplace=True)
ndf_30['value'].replace('4','African Ancestry', inplace=True)
ndf_30['value'].replace('5','AncestryDNA', inplace=True)
ndf_30['value'].replace('6','Atlas', inplace=True)
ndf_30['value'].replace('7','Centrillion Biosciences', inplace=True)
ndf_30['value'].replace('8','Dante Labs', inplace=True)
ndf_30['value'].replace('9','DNA Ancestry and Family Origin', inplace=True)
ndf_30['value'].replace('10','DNA Worldwide', inplace=True)
ndf_30['value'].replace('11','Family Tree DNA', inplace=True)
ndf_30['value'].replace('12','Full Genomes Corporation', inplace=True)
ndf_30['value'].replace('13','Gene by Gene', inplace=True)
ndf_30['value'].replace('14','Genebase', inplace=True)
ndf_30['value'].replace('15','Genera', inplace=True)
ndf_30['value'].replace('16','GenoTek', inplace=True)
ndf_30['value'].replace('17','Genographic Project', inplace=True)
ndf_30['value'].replace('18','Genos Research Inc', inplace=True)
ndf_30['value'].replace('19','Helix', inplace=True)
ndf_30['value'].replace('20','iGENEA', inplace=True)
ndf_30['value'].replace('21','Living DNA', inplace=True)
ndf_30['value'].replace('22','MyHeritage', inplace=True)
ndf_30['value'].replace('23','Oxford Ancestors', inplace=True)
ndf_30['value'].replace('24','Roots for Real', inplace=True)
ndf_30['value'].replace('25','Sano Genetics', inplace=True)
ndf_30['value'].replace('26','Sorenson Genomics', inplace=True)
ndf_30['value'].replace('27','TribeCode', inplace=True)
ndf_30['value'].replace('28','Veritas Genetics', inplace=True)
ndf_30['value'].replace('29','Veritas Intercontinental', inplace=True)
ndf_30['value'].replace('30','WeGene', inplace=True)
ndf_30['value'].replace('31','YSEQ', inplace=True)
ndf_30['value'].replace('32','Yoogene', inplace=True)
ndf_30['value'].replace('33','Other', inplace=True)
ndf_30['value'].replace('34','Other as Text', inplace=True)
### 36
filter = ndf["Group"] == 36
ndf_36 = ndf[filter]
ndf_36['value'].replace('1','1 year',inplace=True)
ndf_36['value'].replace('2','2 years',inplace=True)
ndf_36['value'].replace('3','3 years',inplace=True)
ndf_36['value'].replace('4','4 years',inplace=True)
ndf_36['value'].replace('5','5 years',inplace=True)
ndf_36['value'].replace('6','6+ years',inplace=True)
### 60
filter = ndf["Group"] == 60
ndf_60 = ndf[filter]
ndf_60['value'].replace('1','Male',inplace=True)
ndf_60['value'].replace('2','Female',inplace=True)
ndf_60['value'].replace('3','Gender non-binary',inplace=True)
ndf_60['value'].replace('4','Prefer not to say',inplace=True)
### 20
filter = ndf["Group"] == 22
ndf_22 = ndf[filter]
ndf_22['value'].replace('1','75 percent',inplace=True)
ndf_22['value'].replace('2','Correct - 50 percent',inplace=True)
ndf_22['value'].replace('3','0.01 percent',inplace=True)
ndf_22['value'].replace('4','99.9 percent',inplace=True)
mndf = pd.concat([ndf_22, ndf_33, ndf_34, ndf_35, ndf_36,
ndf_60, ndf_37, ndf_39, ndf_20, ndf_44,
ndf_46, ndf_47, ndf_48, ndf_49, ndf_51,
ndf_52, ndf_55, ndf_56, ndf_2, ndf_4,
ndf_5, ndf_7, ndf_23, ndf_10, ndf_11,
ndf_12, ndf_14, ndf_15, ndf_25, ndf_30]).reset_index()
# for line charts, gender annotations
general_metadata = metadata[metadata['Tag'] == 'General']
gendf = mndf
gendf['Group'] = gendf['Group'].map(str)
filter = gendf["Group"] == '60'
ngendf = gendf[filter]
gendfx = ngendf[['id', 'Option']].copy()
#filter all empty strings from values
mndf["value"] = mndf["value"].map(str)
mndf['value'].replace(' ', np.nan, inplace=True)
mndf= mndf.dropna(subset=['value'])
mndf['Option'] = mndf["value"] # assign option to value
# all non composites
gr_df = pd.read_csv("/home/manu10/Downloads/iglas_work/T_ALL_THIS_ONE.csv", low_memory=False)
non_comp_df = gr_df
#filter all empty strings from values
non_comp_df["value"] = non_comp_df["value"].map(str)
filter = non_comp_df["value"] != ' '
ndf = non_comp_df[filter]
filter = ndf["Composite"] == 'No'
new_df = ndf[filter]
new_df['Composite'].unique()
nndf = new_df
nndf['value'].replace(' ', np.nan, inplace=True)
nndf= nndf.dropna(subset=['value'])
# All continuous
gr_df = pd.read_csv("/home/manu10/Downloads/iglas_work/T_ALL_THIS_ONE.csv", low_memory=False)
non_comp_df = gr_df
#filter all empty strings from values
non_comp_df["value"] = non_comp_df["value"].map(str)
filter = non_comp_df["value"] != ' '
ndf = non_comp_df[filter]
filter = ndf["Composite"] == 'Continuous'
new_df = ndf[filter]
new_df['Composite'].unique()
cmndf = new_df
#filter all empty strings from values
cmndf["value"] = cmndf["value"].map(str)
cmndf['value'].replace(' ', np.nan, inplace=True)
cmndf= cmndf.dropna(subset=['value'])
cmndf['Option'] = cmndf["value"] # assign option to value
# all specials
gr_df = pd.read_csv("/home/manu10/Downloads/iglas_work/T_ALL_THIS_ONE.csv", low_memory=False)
non_comp_df = gr_df
#filter all empty strings from values
non_comp_df["value"] = non_comp_df["value"].map(str)
filter = non_comp_df["value"] != ' '
ndf = non_comp_df[filter]
filter = ndf["Composite"] == 'Special'
new_df = ndf[filter]
new_df['Composite'].unique()
specialdf = new_df
specialdf['value'].replace('+','Positive',inplace=True)
specialdf['value'].replace('-','Negative',inplace=True)
specialdf['Option'] = specialdf['value']
## All together
large_df = pd.concat([mndf, nndf, cmndf, specialdf]).reset_index()
### 29
ndf_29 = ndf[ndf['Group'] == 29]
ndf_29['Option'] = ndf_29['Option'].map(str)
ndf_29['Option'].replace('Medical testing as Selfasinitiated','Medical testing - Self-initiated', inplace=True)
ndf_29['Option'].replace('Medical testing as Recommended for example by doctor','Medical testing - Recommended by doctor', inplace=True)
ndf_29['Option'].replace('Medical testing as Compulsory for example by a court or law enforcement','Medical testing - Compulsory by a court or law enforcement', inplace=True)
ndf_29['Option'].replace('Paternity testing as Selfasinitiated','Paternity testing - Self-initiated', inplace=True)
ndf_29['Option'].replace('Paternity testing as Recommended for example by doctor','Paternity testing - Recommended by doctor', inplace=True)
ndf_29['Option'].replace('Paternity testing as Compulsory for example by a court or law enforcement','Paternity testing - Compulsory by a court or law enforcement', inplace=True)
ndf_29['Option'].replace('Ancestry testing as Selfasinitiated','Ancestry testing - Self-initiated', inplace=True)
ndf_29['Option'].replace('Ancestry testing as Recommended for example by doctor','Ancestry testing - Recommended by doctor', inplace=True)
ndf_29['Option'].replace('Ancestry testing as Compulsory for example by a court or law enforcement','Ancestry testing - Compulsory by a court or law enforcement', inplace=True)
ndf_29['Option'].replace('General interest as Selfasinitiated','General interest - Self-initiated', inplace=True)
ndf_29['Option'].replace('General interest as Recommended for example by doctor','General interest - Recommended by doctor', inplace=True)
ndf_29['Option'].replace('General interest as Compulsory for example by a court or law enforcement','General interest - Compulsory by a court or law enforcement', inplace=True)
ndf_29['Option'].replace('Health and diet as Selfasinitiated','Health and diet - Self-initiated', inplace=True)
ndf_29['Option'].replace('Health and diet as Recommended for example by doctor','Health and diet - Recommended by doctor', inplace=True)
ndf_29['Option'].replace('Health and diet as Compulsory for example by a court or law enforcement','Health and diet - Compulsory by a court or law enforcement', inplace=True)
ndf_29['Option'].replace('As part of a research project as Selfasinitiated','Research project - Self-initiated', inplace=True)
ndf_29['Option'].replace('As part of a research project as Recommended for example by doctor','Research project - Recommended by doctor', inplace=True)
ndf_29['Option'].replace('18','Research project - Compulsory by a court or law enforcement', inplace=True)
ndf_29['Option'].replace('Other as Selfinitiated','Other - Self-initiated', inplace=True)
ndf_29['Option'].replace('Other as Recommended for example by doctor','Other - Recommended by doctor', inplace=True)
ndf_29['Option'].replace('Other as Compulsory for example by a court or law enforcement','Other - Compulsory by a court or law enforcement', inplace=True)
ndf_29['Option'].replace('Other as Text','Other', inplace=True)
ndf29x = ndf_29
# all likert items
l_df = codes_t_all
non_comp_df = l_df
#filter all empty strings from values
non_comp_df["Composite"] = non_comp_df["Composite"].map(str)
filter = non_comp_df["Composite"] == 'Likert'
ndf = non_comp_df[filter]
ndf["value"] = ndf["value"].map(str)
filter = ndf["value"] != ' '
ndf = ndf[filter]
### all likert
ndf['value'].replace('1','Strongly disagree',inplace=True)
ndf['value'].replace('2','Disagree',inplace=True)
ndf['value'].replace('3','Neutral',inplace=True)
ndf['value'].replace('4','Agree',inplace=True)
ndf['value'].replace('5','Strongly agree',inplace=True)
ndf['Option'] = ndf['value']
ndf["value"] = ndf["value"].map(str)
filter = ndf["value"] != ' '
l_df = ndf
# all gk items
gk_df = codes_t_all
non_comp_df = gk_df
#filter all empty strings from values
non_comp_df["Tag"] = non_comp_df["Tag"].map(str)
filter = non_comp_df["Tag"] == 'GK'
ndf = non_comp_df[filter]
ndf["value"] = ndf["value"].map(str)
filter = ndf["value"] != ' '
### item 58
filter = ndf["Group"] == 58
ndf_58 = ndf[filter]
ndf_58['value'].replace('1','A sex chromosome',inplace=True)
ndf_58['value'].replace('2','Correct - The entire sequence of DNA of an individual',inplace=True)
ndf_58['value'].replace('3','All the genes in the DNA',inplace=True)
ndf_58['value'].replace('4','Gene expression',inplace=True)
ndf_58['Option'] = ndf_58['value']
### item 60
filter = ndf["Group"] == 60
ndf_60 = ndf[filter]
ndf_60['value'].replace('1','GPHO',inplace=True)
ndf_60['value'].replace('2','HTPR',inplace=True)
ndf_60['value'].replace('3','Correct - GCTA',inplace=True)
ndf_60['value'].replace('4','LFWE',inplace=True)
ndf_60['Option'] = ndf_60['value']
### item 59
filter = ndf["Group"] == 59
ndf_59 = ndf[filter]
ndf_59['value'].replace('1','Less than 50 percent',inplace=True)
ndf_59['value'].replace('2','75 percent',inplace=True)
ndf_59['value'].replace('3','90 percent',inplace=True)
ndf_59['value'].replace('4','Correct – More than 99 percent',inplace=True)
ndf_59['Option'] = ndf_59['value']
### item 61
filter = ndf["Group"] == 61
ndf_61 = ndf[filter]
ndf_61['value'].replace('1','One gene',inplace=True)
ndf_61['value'].replace('2','Correct – Many genes',inplace=True)
ndf_61['Option'] = ndf_61['value']
### item 62
filter = ndf["Group"] == 62
ndf_62 = ndf[filter]
ndf_62['value'].replace('1','Entirely different',inplace=True)
ndf_62['value'].replace('2','About 50 percent the same',inplace=True)
ndf_62['value'].replace('3','More than 90 percent the same',inplace=True)
ndf_62['value'].replace('4','Correct – One hundred percent identical',inplace=True)
ndf_62['Option'] = ndf_62['value']
### item 63
filter = ndf["Group"] == 63
ndf_63 = ndf[filter]
ndf_63['value'].replace('1','Correct - True',inplace=True)
ndf_63['value'].replace('2','False',inplace=True)
ndf_63['Option'] = ndf_63['value']
### item 64
filter = ndf["Group"] == 64
ndf_64 = ndf[filter]
ndf_64['value'].replace('1','If someone has insomnia this is approximately this is approximately thirty percent due to their genes',inplace=True)
ndf_64['value'].replace('2','Approximately thirty percent of people will experience insomnia at some point in their lives',inplace=True)
ndf_64['value'].replace('3','Correct – Genetic influences account for approximately thirty percent of differences between people in insomnia',inplace=True)
ndf_64['value'].replace('4','There is an approximately 30 percent chance that someone will pass insomnia onto their children',inplace=True)
ndf_64['Option'] = ndf_64['value']
gk_df = pd.concat([ndf_58, ndf_60, ndf_59, ndf_61, ndf_62, ndf_63, ndf_64]).reset_index()
## All together
gk_df["Option"] = gk_df["Option"].map(str)
filter = gk_df["Option"] != ' '
gk_df = gk_df[filter]
new_large_df = pd.concat([mndf, nndf, cmndf, specialdf, l_df, gk_df]).reset_index()
/tmp/ipykernel_101722/3204455147.py:9: FutureWarning: Index.__and__ operating as a set operation is deprecated, in the future this will be a logical operation matching Series.__and__. Use index.intersection(other) instead. /tmp/ipykernel_101722/3204455147.py:11: FutureWarning: Index.__and__ operating as a set operation is deprecated, in the future this will be a logical operation matching Series.__and__. Use index.intersection(other) instead. /tmp/ipykernel_101722/3204455147.py:23: FutureWarning: Index.__and__ operating as a set operation is deprecated, in the future this will be a logical operation matching Series.__and__. Use index.intersection(other) instead.
# assign a score 1 if correct option was selected
gk_df['Valid'] = gk_df['Option'].apply(lambda x: int('Correct' in x) if isinstance('Correct', str) else 0)
ndf = gk_df
ndf['UserLanguage'] = ndf['UserLanguage'].map(str)
filter = ndf["UserLanguage"] == 'RU'
ndf = ndf[filter]
##### High/ low scoring participants
# get composite scores
new_df = ndf.groupby(['id'])["Valid"].mean().round(2).reset_index()
new_df.loc[new_df['Valid'] < 0.57, 'Scoring_profile'] = 'Low GK Score'
new_df['Scoring_profile'].fillna('High GK Score', inplace=True)
#### Male female
xdf = new_large_df
## Filters no null values for options, and group is 60 i.e. gender
xdf["value"] = xdf["value"].map(str)
#filter = xdf["Option"] != ' '
#xdf = xdf[filter]
xdf["Group"] = xdf["Group"].map(str)
filter = xdf["Group"] == '60'
xdf = xdf[filter]
# multi-filter for gender male and female
select = ['Male', 'Female']
xdf = xdf[xdf['value'].isin(select)]
gen_df = xdf
gen_df['gender'] = xdf['value']
gen_df = gen_df[['id', 'Option', 'gender']].reset_index(level=0, drop=True)
##### Old young
import statistics # will be used later
xdf = new_large_df
## Filters no null values for options, and group is 60 i.e. gender
xdf["value"] = xdf["value"].map(str)
#filter = xdf["Option"] != ' '
#xdf = xdf[filter]
xdf["Group"] = xdf["Group"].map(str)
filter = xdf["Group"] == '32'
xdf = xdf[filter]
age_df = xdf
age_df['value']= age_df['value'].map(int)
age_df.loc[age_df['value'] < 32, 'Age_profile'] = 'Younger'
age_df['Age_profile'].fillna('Older', inplace=True)
age_df = age_df[['id', 'Option', 'Age_profile']].reset_index(level=0, drop=True)
age_df.columns = ['id','Age', 'Age Profile']
#### high low confidence
xdf = new_large_df
## Filters no null values for options, and group is 60 i.e. gender
xdf["value"] = xdf["value"].map(str)
#filter = xdf["Option"] != ' '
#xdf = xdf[filter]
xdf["Group"] = xdf["Group"].map(str)
filter = xdf["Group"] == '57'
xdf = xdf[filter]
conf_df = xdf
conf_df['value']= conf_df['value'].map(int)
ndf = conf_df
ndf['UserLanguage'] = ndf['UserLanguage'].map(str)
filter = ndf["UserLanguage"] == 'RU'
ndf = ndf[filter]
conf_df.loc[conf_df['value'] < 50, 'Conf_profile'] = 'Low confidence'
conf_df['Conf_profile'].fillna('High confident', inplace=True)
conf_df = conf_df[['id', 'Option', 'Conf_profile']].reset_index(level=0, drop=True)
conf_df.columns = ['id', 'Confidence', 'Confidence profile']
conf_df.head(n=3)
#### law non law
xdf = new_large_df
## Filters no null values for options and groups 35 and 37
xdf["Group"] = xdf["Group"].map(str)
filter = xdf["Group"] != ' '
xdf = xdf[filter]
select = ['35', '37', '52', '20']
xdf = xdf[xdf['Group'].isin(select)]
# assign a score 1 if Law was selected
xdf['Legal'] = xdf['Option'].apply(lambda x: int('Law' in x) if isinstance('Law', str) else 0)
xdf['Legal'] = xdf['Legal'].map(str)
xdf['Legal'] = xdf['Legal'].str.replace('0', 'Non law')
xdf['Legal'] = xdf['Legal'].str.replace('1', 'Law')
law_df = xdf
law_df = law_df[['id', 'Option', 'Legal']].reset_index(level=0, drop=True)
law_df = law_df.drop_duplicates(subset='id', keep="first")
#### student non student
xdf = new_large_df
## Filters no null values for options and groups 35 i.e. education if university student is selected
xdf["Group"] = xdf["Group"].map(str)
filter = xdf["Group"] != ' '
xdf = xdf[filter]
select = ['34']
xdf = xdf[xdf['Group'].isin(select)]
#####
nxdf = new_large_df
## Filters no null values for options and groups 35
nxdf["Group"] = nxdf["Group"].map(str)
filter = nxdf["Group"] != ' '
nxdf = nxdf[filter]
select = ['34']
nxdf = nxdf[nxdf['Group'].isin(select)]
# assign a score 1 if Law was selected, else 0
nxdf['student'] = xdf['Option'].apply(lambda x: int('Yes' in x) if isinstance('Yes', str) else 0)
nxdf['student'] = nxdf['student'].map(str)
nxdf['student'] = nxdf['student'].str.replace('1', 'Student')
nxdf['student'] = nxdf['student'].str.replace('0', 'Not student')
psnsdist = nxdf[['id', 'Option', 'student']].reset_index(level=0, drop=True)
psnsdist
#msnsdf = pd.merge(psnsdist, snsdist, on='id')
#msnsdf
xdf = psnsdist
xdf['student'] = xdf['student'].map(str)
xdf = xdf[filter]
select = ['Not student']
xdf = xdf[xdf['student'].isin(select)]
not_students = xdf
not_students['branch'] = not_students['student']
not_students['branch'] = 'Not a student'
del not_students['student']
del not_students['Option']
##### law non law non students
xdf = new_large_df
## Filters no null values for options and groups 35 i.e. education if university branch is selected
xdf["Group"] = xdf["Group"].map(str)
filter = xdf["Group"] != ' '
xdf = xdf[filter]
select = ['35']
xdf = xdf[xdf['Group'].isin(select)]
#####
nxdf = new_large_df
## Filters no null values for options and groups 35
nxdf["Group"] = nxdf["Group"].map(str)
filter = nxdf["Group"] != ' '
nxdf = nxdf[filter]
select = ['35']
nxdf = nxdf[nxdf['Group'].isin(select)]
# assign a score 1 if Law was selected, else 0
nxdf['branch'] = xdf['Option'].apply(lambda x: int('Law' in x) if isinstance('Law', str) else 0)
nxdf['branch'] = nxdf['branch'].map(str)
nxdf['branch'] = nxdf['branch'].str.replace('1', 'Law branch')
nxdf['branch'] = nxdf['branch'].str.replace('0', 'Other branch')
snsdist = nxdf[['id', 'Option', 'branch']].reset_index(level=0, drop=True)
branch_df = pd.concat([snsdist, not_students])
del branch_df['Option']
#### high medium low curious
xdf = new_large_df
## Filters no null values for options and groups 35 and 37
xdf["Group"] = xdf["Group"].map(str)
filter = xdf["Group"] != ' '
xdf = xdf[filter]
select = ['24']
xdf = xdf[xdf['Group'].isin(select)]
ndf = xdf
ndf['UserLanguage'] = ndf['UserLanguage'].map(str)
filter = ndf["UserLanguage"] == 'RU'
ndf = ndf[filter]
cdf = ndf.groupby(["Group", "Description", "id", "Option"])["value"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+cdf['value'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
cdf['rating'] = nx.iloc[:,2]
wo = []
for i in range(len(cdf['rating'])) :
wo.append(pd.Series(cdf.iloc[i, 5]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
vc = pd.DataFrame(wo)
ndfx = pd.concat([cdf, vc], axis=1)
#del ndfx['Option']
ndfx
hr = pd.read_csv("/home/manu10/Downloads/iglas_work/metadata.csv", sep=':', low_memory=False)
del hr["Description"]
del hr["Group"]
del hr["Composite"]
del hr["Tag"]
t_hr = pd.merge(ndfx, hr, on='Option')
del t_hr["rating"]
#del t_hr["Option"]
del t_hr['Variable']
del t_hr['id']
del t_hr['Definitely']
del t_hr['Under certain circumstances']
del t_hr['Most Likely']
del t_hr['Never']
t_hr
lex = t_hr.set_index(['Group','Description', 'Option'])["value"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+cdf['value'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
t_hr['rating'] = """'"""+t_hr['value']+"""'"""
wo = []
for i in range(len(cdf['rating'])) :
wo.append(pd.Series(t_hr.iloc[i, 4]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
vc = pd.DataFrame(wo)
ndfx = pd.concat([t_hr, vc], axis=1)
#del ndfx['Option']
del ndfx['rating']
lex = ndfx.groupby(['Group','Description','Option', 'value']).count().reset_index()
#lex['heatmap'] = lex['Definitely']+lex['Under certain circumstances']+lex['Most Likely']+lex['Never']
df = pd.read_csv('/home/manu10/Downloads/iglas_work/item_24', sep='\t')
df.index = df.Option
del df['Option']
df = df.apply(pd.to_numeric, errors='coerce')
df = df.apply(lambda g: g / g.sum()).round(2).reset_index()
df.index = df.Option
del df['Option']
ndf = xdf
ndf['UserLanguage'] = ndf['UserLanguage'].map(str)
filter = ndf["UserLanguage"] == 'RU'
ndf = ndf[filter]
cdf = ndf.groupby(["Group", "Description", "id", "Option"])["value"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+cdf['value'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
cdf['rating'] = nx.iloc[:,2]
wo = []
for i in range(len(cdf['rating'])) :
wo.append(pd.Series(cdf.iloc[i, 5]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
vc = pd.DataFrame(wo)
ndfx = pd.concat([cdf, vc], axis=1)
#del ndfx['Option']
ndfx
hr = pd.read_csv("/home/manu10/Downloads/iglas_work/metadata.csv", sep=':', low_memory=False)
del hr["Description"]
del hr["Group"]
del hr["Composite"]
del hr["Tag"]
t_hr = pd.merge(ndfx, hr, on='Option')
del t_hr["rating"]
#del t_hr["Option"]
del t_hr['Variable']
#del t_hr['id']
del t_hr['Definitely']
del t_hr['Under certain circumstances']
del t_hr['Most Likely']
del t_hr['Never']
t_hr
lex = t_hr.set_index(['Group','Description', 'Option'])["value"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+cdf['value'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
t_hr['rating'] = """'"""+t_hr['value']+"""'"""
wo = []
for i in range(len(cdf['rating'])) :
wo.append(pd.Series(t_hr.iloc[i, 5]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
vc = pd.DataFrame(wo)
ndfx = pd.concat([t_hr, vc], axis=1)
#del ndfx['Option']
del ndfx['rating']
ndfx = ndfx.fillna(0)
### Assigning scores here
ndfx['Definitely'] = ndfx['Definitely']*100
ndfx['Under certain circumstances'] = ndfx['Under certain circumstances']*33
ndfx['Most Likely'] = ndfx['Most Likely']*66
ndfx['Never'] = ndfx['Never']*0
lex = ndfx.groupby(["id"]).sum().reset_index()
# get row sum
lex['curious_score'] = lex.iloc[:, 1:5].sum(axis=1)
# scale the datframe from 1 to 100
# scaling the original grade column at 0 to 1
from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler(feature_range=(0,100))
lex['curious_score_scaled'] = scaler.fit_transform(lex[["curious_score"]])
lex = lex[lex['curious_score_scaled'] < 31]
cond = [lex['curious_score_scaled'] < 10, lex['curious_score_scaled'].between(10, 18), lex['curious_score_scaled'] >= 18]
choice = ['Low Genetic Curiosity', 'Medium Genetic Curiosity', 'High Genetic Curiosity']
lex['curiosity'] = np.select(cond, choice)
nlex = lex
del nlex['Definitely']
del nlex['Under certain circumstances']
del nlex['Most Likely']
del nlex['Never']
curious_df = nlex.reset_index()
##### high medium low concern
xdf = new_large_df
## Filters no null values for options and groups 35 and 37
xdf["Group"] = xdf["Group"].map(str)
filter = xdf["Group"] != ' '
xdf = xdf[filter]
select = ['27']
xdf = xdf[xdf['Group'].isin(select)]
ndf = xdf
ndf['UserLanguage'] = ndf['UserLanguage'].map(str)
filter = ndf["UserLanguage"] == 'RU'
ndf = ndf[filter]
cdf = ndf.groupby(["Group", "Description", "id", "Option"])["value"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+cdf['value'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
cdf['rating'] = nx.iloc[:,2]
wo = []
for i in range(len(cdf['rating'])) :
wo.append(pd.Series(cdf.iloc[i, 5]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
vc = pd.DataFrame(wo)
ndfx = pd.concat([cdf, vc], axis=1)
#del ndfx['Option']
ndfx.head(3)
del ndfx['Group']
del ndfx['Description']
del ndfx['value']
del ndfx['rating']
del ndfx["""I’m not interested"""] #i'm not interested column is being deleted here
lex = ndfx.groupby(['id']).count().reset_index()
lex['concern_score'] = lex['Option']
glex = lex[['concern_score', 'id']]
# scale the datframe from 1 to 100
# scaling the original grade column at 0 to 1
from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler(feature_range=(0,100))
glex['concern_score_scaled'] = scaler.fit_transform(lex[["concern_score"]])
glex['concern'] = glex['concern_score_scaled']
cond = [glex['concern'] < 14, glex['concern'].between(14, 29), glex['concern'] >= 29]
choice = ['Low Concern', 'Medium Concern', 'High Concern']
glex['concern'] = np.select(cond, choice)
concern_df = glex.reset_index()
################################################ combining annotations
from functools import reduce
dfs = [new_df, gen_df, age_df, conf_df, law_df, psnsdist, branch_df, concern_df, curious_df]
df_final = reduce(lambda left,right: pd.merge(left,right,on='id'), dfs)
#df_final['id'] = df_final.index
#df_final = df_final.reset_index()
cadf = df_final.drop_duplicates(subset='id', keep="last")
del cadf['Option_x']
del cadf['Option_y']
del cadf['index_x']
del cadf['index_y']
del cadf['Option']
cadf.head(n=2)
xdf = new_large_df
xdf['Progress']= xdf['Progress'].map(int)
filter = xdf["Progress"] > 75
xdf = xdf[filter]
# merging with new_large_df
annotated_df = pd.merge(cadf, xdf, on='id')
del annotated_df['level_0']
del annotated_df['index']
annotated_df.head(n=5)
len(annotated_df.id.unique())
/tmp/ipykernel_101722/3313794914.py:113: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
773
annotated_df
| id | Valid | Scoring_profile | gender | Age | Age Profile | Confidence | Confidence profile | Legal | student | branch | concern_score | concern_score_scaled | concern | curious_score | curious_score_scaled | curiosity | Progress | UserLanguage | Collection | value | Variable | Description | Option | Group | Composite | Tag | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0.17 | Low GK Score | Female | 41 | Older | 10 | Low confidence | Non law | Student | Other branch | 1 | 0.000000 | Low Concern | 465.0 | 24.068323 | High Genetic Curiosity | 100 | RU | Pilot | 99.9 percent | LE5.015 | People differ in the amount of DNA they share.... | 99.9 percent | 22 | Yes | GK |
| 1 | 0 | 0.17 | Low GK Score | Female | 41 | Older | 10 | Low confidence | Non law | Student | Other branch | 1 | 0.000000 | Low Concern | 465.0 | 24.068323 | High Genetic Curiosity | 100 | RU | Pilot | Of Master | LE2.059 | Education | Of Master | 33 | Yes | General |
| 2 | 0 | 0.17 | Low GK Score | Female | 41 | Older | 10 | Low confidence | Non law | Student | Other branch | 1 | 0.000000 | Low Concern | 465.0 | 24.068323 | High Genetic Curiosity | 100 | RU | Pilot | Yes11 | LE2.060 | University student | Yes11 | 34 | Yes | General |
| 3 | 0 | 0.17 | Low GK Score | Female | 41 | Older | 10 | Low confidence | Non law | Student | Other branch | 1 | 0.000000 | Low Concern | 465.0 | 24.068323 | High Genetic Curiosity | 100 | RU | Pilot | Education | LE2.061 | Field of education | Education | 35 | Yes | General |
| 4 | 0 | 0.17 | Low GK Score | Female | 41 | Older | 10 | Low confidence | Non law | Student | Other branch | 1 | 0.000000 | Low Concern | 465.0 | 24.068323 | High Genetic Curiosity | 100 | RU | Pilot | 1 year | LE2.062 | Year of education | 1 year | 36 | Yes | General |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 47970 | 1888 | 0.43 | Low GK Score | Female | 49 | Older | 20 | Low confidence | Non law | Not student | Not a student | 3 | 28.571429 | Medium Concern | 99.0 | 5.124224 | Low Genetic Curiosity | 100 | RU | Moscow Teachers | 90 percent | LE5.017 | On average, how much of their total DNA is the... | 90 percent | 59 | Yes | GK |
| 47971 | 1888 | 0.43 | Low GK Score | Female | 49 | Older | 20 | Low confidence | Non law | Not student | Not a student | 3 | 28.571429 | Medium Concern | 99.0 | 5.124224 | Low Genetic Curiosity | 100 | RU | Moscow Teachers | One gene | LE5.018 | Genetic contribution to the risk of developing... | One gene | 61 | Yes | GK |
| 47972 | 1888 | 0.43 | Low GK Score | Female | 49 | Older | 20 | Low confidence | Non law | Not student | Not a student | 3 | 28.571429 | Medium Concern | 99.0 | 5.124224 | Low Genetic Curiosity | 100 | RU | Moscow Teachers | Correct – One hundred percent identical | LE5.024 | The DNA sequence in two different cells, for e... | Correct – One hundred percent identical | 62 | Yes | GK |
| 47973 | 1888 | 0.43 | Low GK Score | Female | 49 | Older | 20 | Low confidence | Non law | Not student | Not a student | 3 | 28.571429 | Medium Concern | 99.0 | 5.124224 | Low Genetic Curiosity | 100 | RU | Moscow Teachers | Correct - True | LE5.030 | Some of the genes that relate to dyslexia also... | Correct - True | 63 | Yes | GK |
| 47974 | 1888 | 0.43 | Low GK Score | Female | 49 | Older | 20 | Low confidence | Non law | Not student | Not a student | 3 | 28.571429 | Medium Concern | 99.0 | 5.124224 | Low Genetic Curiosity | 100 | RU | Moscow Teachers | There is an approximately 30 percent chance th... | LE5.031 | If a report states ‘the heritability of insomn... | There is an approximately 30 percent chance th... | 64 | Yes | GK |
47975 rows × 27 columns
import matplotlib.pyplot as plt
annotated_df['UserLanguage'] = annotated_df['UserLanguage'].map(str)
filter = annotated_df['UserLanguage'] == 'RU'
filtered_annotated_df = annotated_df[filter]
afx = filtered_annotated_df.iloc[:,0:17]
afx = afx.drop_duplicates(subset='id')
afx = afx[['id', 'Scoring_profile', 'gender', 'Age Profile', 'Confidence profile', 'Legal', 'student', 'branch', 'concern', 'curiosity']]
afx.head(2)
afx = afx.melt(id_vars=['id'],
value_vars=['Scoring_profile', 'gender', 'Age Profile', 'Confidence profile', 'Legal',
'student', 'branch', 'concern', 'curiosity'],
var_name='Description', value_name='Option')
afx.head(2)
afx['Variable'] = 'Class_X'
afx['Group'] = '77'
afx.head(2)
| id | Description | Option | Variable | Group | |
|---|---|---|---|---|---|
| 0 | 0 | Scoring_profile | Low GK Score | Class_X | 77 |
| 1 | 1 | Scoring_profile | High GK Score | Class_X | 77 |
afx['Description'] = afx['Description'].map(str)
afx['Description'].unique()
array(['Scoring_profile', 'gender', 'Age Profile', 'Confidence profile',
'Legal', 'student', 'branch', 'concern', 'curiosity'], dtype=object)
afx['Description'].replace('Scoring_profile','GK Score',inplace=True)
afx['Description'].replace('gender','Gender',inplace=True)
afx['Description'].replace('Age Profile','Age',inplace=True)
afx['Description'].replace('Confidence profile','Confidence in GK',inplace=True)
afx['Description'].replace('Legal','Related/ Not related to law',inplace=True)
afx['Description'].replace('student','Students/ Non Students',inplace=True)
afx['Description'].replace('branch','Law or Non Law Students and Non Students',inplace=True)
afx['Description'].replace('concern','Concern',inplace=True)
afx['Description'].replace('curiosity','Genetic Curiosity',inplace=True)
afx['Option'] = afx['Option'].map(str)
afx['Option'].unique()
array(['Low GK Score', 'High GK Score', 'Female', 'Male', 'Older',
'Younger', 'Low confidence', 'High confident', 'Non law', 'Law',
'Student', 'Not student', 'Other branch', 'Not a student',
'Law branch', 'Low Concern', 'Medium Concern', 'High Concern',
'High Genetic Curiosity', 'Low Genetic Curiosity',
'Medium Genetic Curiosity'], dtype=object)
afx['Option'].replace('Female','Female Participants',inplace=True)
afx['Option'].replace('Male','Male Participants',inplace=True)
afx['Option'].replace('Older','Older Participants',inplace=True)
afx['Option'].replace('Younger','Younger Participants',inplace=True)
afx['Option'].replace('Low confidence','Low GK Confidence',inplace=True)
afx['Option'].replace('High confident','High GK Confidence',inplace=True)
afx['Option'].replace('Non law','Participants not related to law',inplace=True)
afx['Option'].replace('Law','Participants related to law',inplace=True)
afx['Option'].replace('Student','Students',inplace=True)
afx['Option'].replace('Not student','Not Students',inplace=True)
afx['Option'].replace('Other branch','Non Law Students',inplace=True)
afx['Option'].replace('Law branch','Law Students',inplace=True)
afx['Option'].replace('Not a student','Not Students',inplace=True)
afx['Option'].replace('High concern','High Concern',inplace=True)
afx['Option'].replace('Medium concern','Medium Concern',inplace=True)
afx['Option'].replace('Low concern','Low Concern',inplace=True)
subset_fx = filtered_annotated_df[['id', 'Description', 'Option', 'Variable', 'Group']]
subset_fx.head()
concat_df = pd.concat([afx,subset_fx], axis=0)
concat_df.head(2)
list_of_values = afx.id.unique()
concat_df['id'] = concat_df['id'].map(int)
select_df = concat_df[concat_df['id'].isin(list_of_values)]
len(select_df.id.unique())
list_gp = ['25', '27', '29', '30']
other_df = new_large_df[new_large_df['Group'].isin(list_gp)]
other_df = other_df[['id', 'Description', 'Option', 'Variable', 'Group']].copy()
other_df[other_df['Group'] == '30']
# while ndf
ndf = select_df
# group filter
ndf['Group'] = ndf['Group'].map(str)
select = ['23', '24', '65', '66', '67']
ndf["Group"] = ndf["Group"].map(str)
ndf = ndf[ndf['Group'].isin(select)]
ndf.shape
ndf = pd.concat([ndf, other_df]).reset_index()
ndf.shape
ndf[ndf['Group'] == '30']
### correcting labels
### 23
filter = ndf["Group"] == '23'
ndf_23 = ndf[filter]
ndf_23['Option'] = ndf_23['Option'].map(str)
ndf_23['Option'].replace('1','One legal guardian sufficient',inplace=True)
ndf_23['Option'].replace('2','Two legal guardians need to agree',inplace=True)
ndf_23['Option'].replace('3','Medical facilities',inplace=True)
ndf_23['Option'].replace('4','The State',inplace=True)
ndf_23['Option'].replace('5','Prohibited until child has legal capacity',inplace=True)
ndf_23['Option'].replace('6','Do not know',inplace=True)
ndf_23['Option'].replace('7','Other',inplace=True)
###
filter = ndf["Group"] == '65'
ndf_65 = ndf[filter]
ndf_65['Option'] = ndf_65['Option'].map(str)
ndf_65['Option'].replace('Agree','Agree to dissemination of GK',inplace=True)
ndf_65['Option'].replace('Strongly agree','Strongly agree to dissemination of GK',inplace=True)
ndf_65['Option'].replace('Neutral','Neutral towards to dissemination of GK',inplace=True)
ndf_65['Option'].replace('Disagree','Disagree to dissemination of GK',inplace=True)
ndf_65['Option'].replace('Strongly disagree','Strongly disagree to dissemination of GK',inplace=True)
###
filter = ndf["Group"] == '66'
ndf_66 = ndf[filter]
ndf_66['Option'] = ndf_66['Option'].map(str)
ndf_66['Option'].replace('Agree','Agree to Policymaking',inplace=True)
ndf_66['Option'].replace('Strongly agree','Strongly agree to Policymaking',inplace=True)
ndf_66['Option'].replace('Neutral','Neutral towards to Policymaking',inplace=True)
ndf_66['Option'].replace('Disagree','Disagree to Policymaking',inplace=True)
ndf_66['Option'].replace('Strongly disagree','Strongly disagree to Policymaking',inplace=True)
###
filter = ndf["Group"] == '67'
ndf_67 = ndf[filter]
ndf_67['Option'] = ndf_67['Option'].map(str)
ndf_67['Option'].replace('Agree','Agree to Revising and Updating',inplace=True)
ndf_67['Option'].replace('Strongly agree','Strongly agree to Revising and Updating',inplace=True)
ndf_67['Option'].replace('Neutral','Neutral towards to Revising and Updating',inplace=True)
ndf_67['Option'].replace('Disagree','Disagree to Revising and Updating',inplace=True)
ndf_67['Option'].replace('Strongly disagree','Strongly disagree to Revising and Updating',inplace=True)
###
filter = ndf["Group"] == '24'
ndf_24 = ndf[filter]
###
filter = ndf["Group"] == '25'
ndf_25 = ndf[filter]
ndf_25['Option'] = ndf_25['Option'].map(str)
ndf_25['Option'].replace('Yes1','Yes there should be a law',inplace=True)
ndf_25['Option'].replace('No1','No there should not be a law',inplace=True)
###
filter = ndf["Group"] == '27'
ndf_27 = ndf[filter]
###
filter = ndf["Group"] == '29'
ndf_29 = ndf[filter]
ndf_29['Option'] = ndf_25['Option'].map(str)
###
filter = ndf["Group"] == '30'
ndf_30 = ndf[filter]
ndf_30['Option'] = ndf_30['Option'].map(str)
select_df['Group'] = select_df['Group'].map(str)
select = ['77']
cps = select_df[select_df['Group'].isin(select)]
## ever had genetic testing done
ndf_29_new = pd.merge(ndf_29, ndf29x, on='id')
ndf_29_new = ndf_29_new.drop_duplicates(subset=['id', 'Option_y'])
ndf_29_new['Option'] = ndf_29_new['Option_y']
ndf_29_new['Variable'] = ndf_29_new['Variable_y']
ndf_29_new['Group'] = ndf_29_new['Group_y']
ndf_29_new['Description'] = ndf_29_new['Description_y']
ndf_29_new = ndf_29_new[['id', 'Description', 'Option', 'Variable', 'Group']].copy()
ndf_29_new['Option'].replace('Other', 'No', inplace=True)
ndf_29_new.head(2)
| id | Description | Option | Variable | Group | |
|---|---|---|---|---|---|
| 0 | 5 | Have you ever had genetic testing and why? | Medical testing - Self-initiated | LE2.003 | 29 |
| 1 | 5 | Have you ever had genetic testing and why? | Ancestry testing - Self-initiated | LE2.009 | 29 |
cdf = ndf_29_new.groupby(["Group", "Description", "Variable"])["Option"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+cdf['Option'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
cdf['rating'] = nx.iloc[:,2]
wo = []
for i in range(len(cdf['rating'])) :
wo.append(pd.Series(cdf.iloc[i, 4]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
vc = pd.DataFrame(wo)
ndfx = pd.concat([cdf, vc], axis=1)
del ndfx['Option']
del ndfx['rating']
lex = ndfx.set_index(['Group','Description','Variable']).stack().reset_index()
lex["Option"] = lex['level_3']
lex["Count"] = lex[0]
del lex['level_3']
del lex[0]
x = lex.groupby(['Group','Description', 'Variable', 'Option'])['Count'].mean().round(2)
xf = x.groupby(level=[0, 1]).apply(lambda g: g / g.sum()).round(2).reset_index()
comp_df = ndf_29_new
#filter all empty strings from values
comp_df["Group"] = comp_df["Group"].map(str)
filter = comp_df["Group"] == '29'
ndf = comp_df[filter]
# Teaching people's role profile
temp_series = ndf['Option'].value_counts()
labels = (np.array(temp_series.index))
sizes = (np.array((temp_series / temp_series.sum())*100))
trace = go.Pie(labels=labels, values=sizes)
layout = go.Layout(
title="LE2.003: Have you ever had genetic testing and why? N={}".format(len(comp_df.id.unique()))
)
data = [trace]
fig = go.Figure(data=data, layout=layout)
py.iplot(fig)
df = ndf_29_new
df['Option'] = df['Option'].map(str)
xx = df['Option'].str.split('-', n=1, expand=True)
comp_df = xx
temp_series = xx[0].value_counts()
labels = (np.array(temp_series.index))
sizes = (np.array((temp_series / temp_series.sum())*100))
trace = go.Pie(labels=labels, values=sizes)
layout = go.Layout(
title="LE2.003: Have you ever had genetic testing and why? (N={})".format(len(df.id.unique()))
)
data = [trace]
fig = go.Figure(data=data, layout=layout)
py.iplot(fig)
df = ndf_29_new
df['Option'] = df['Option'].map(str)
xx = df['Option'].str.split('-', n=1, expand=True)
comp_df = xx
temp_series = xx[1].value_counts()
labels = (np.array(temp_series.index))
sizes = (np.array((temp_series / temp_series.sum())*100))
trace = go.Pie(labels=labels, values=sizes)
layout = go.Layout(
title="LE2.003: Have you ever had genetic testing and why? (N={})".format(len(df.id.unique()))
)
data = [trace]
fig = go.Figure(data=data, layout=layout)
py.iplot(fig)
comp_df = ndf_30
#filter all empty strings from values
comp_df["Group"] = comp_df["Group"].map(str)
filter = comp_df["Group"] == '30'
ndf = comp_df[filter]
# Teaching people's role profile
temp_series = ndf['Option'].value_counts()
labels = (np.array(temp_series.index))
sizes = (np.array((temp_series / temp_series.sum())*100))
trace = go.Pie(labels=labels, values=sizes)
layout = go.Layout(
title="Variables LE2.25-57: If you have used DTC genetic testing, <br>which company did you use? (N={})".format(len(comp_df.id.unique()))
)
data = [trace]
fig = go.Figure(data=data, layout=layout)
py.iplot(fig)
comp_df = ndf_25
#filter all empty strings from values
comp_df["Group"] = comp_df["Group"].map(str)
filter = comp_df["Group"] == '25'
ndf = comp_df[filter]
# Teaching people's role profile
temp_series = ndf['Option'].value_counts()
labels = (np.array(temp_series.index))
sizes = (np.array((temp_series / temp_series.sum())*100))
trace = go.Pie(labels=labels, values=sizes)
layout = go.Layout(
title="LE3.141: Should there be a law regulating how a person protects their<br> own genetic data? (N={})".format(len(comp_df))
)
data = [trace]
fig = go.Figure(data=data, layout=layout)
py.iplot(fig)
list_to_keep = ndf_23.id.unique()
sdf = specialdf[specialdf['id'].isin(list_to_keep)]
sdf = sdf.drop('Progress', axis=1)
sdf = sdf.drop('UserLanguage', axis=1)
sdf = sdf.drop('Collection', axis=1)
sdf = sdf.drop('value', axis=1)
sdf = sdf.drop('Composite', axis=1)
sdf = sdf.drop('Tag', axis=1)
pdx = nndf
#cps['Option'] = cps['Option']+' '+cps['Description']
cps.Option.unique()
megadf = pd.concat([cps ,ndf_23, ndf_24, ndf_65, ndf_66, ndf_67, ndf_25, ndf_27, ndf_29_new, ndf_30, sdf]).reset_index()
del megadf['level_0']
del megadf['index']
megadf['Group'] = megadf['Group'].map(str)
megadf.Variable.unique()
array(['Class_X', 'LE3.087', 'LE3.101', 'LE3.102', 'LE3.103', 'LE3.104',
'LE3.105', 'LE3.106', 'LE3.107', 'LE3.199', 'LE3.200', 'LE3.201',
'LE3.141', 'LE2.122', 'LE2.123', 'LE2.124', 'LE2.125', 'LE2.126',
'LE2.127', 'LE2.128', 'LE2.129', 'LE2.130', 'LE2.003', 'LE2.009',
'LE2.006', 'LE2.004', 'LE2.012', 'LE2.015', 'LE2.016', 'LE2.018',
'LE2.022', 'LE2.013', 'LE2.019', 'LE2.021', 'LE2.005', 'LE2.014',
'LE2.020', 'LE2.008', 'LE2.007', 'LE2.010', 'LE2.017', 'LE2.024',
'LE2.011', 'LE2.023', 'LE3.045', 'LE3.046', 'LE3.047', 'LE3.048',
'LE3.049', 'LE3.050', 'LE3.051', 'LE3.052', 'LE3.053', 'LE3.054',
'LE3.055', 'LE3.056', 'LE3.057', 'LE3.058', 'LE3.059', 'LE3.060',
'LE3.061', 'LE3.062', 'LE3.063', 'LE3.064', 'LE3.066', 'LE3.067',
'LE3.068', 'LE3.069', 'LE3.070', 'LE3.071', 'LE3.072', 'LE3.073',
'LE3.074', 'LE3.075', 'LE3.076', 'LE3.077', 'LE3.078', 'LE3.079',
'LE3.080', 'LE3.081', 'LE3.082', 'LE3.083', 'LE3.084', 'LE3.085'],
dtype=object)
options = megadf.Group.unique()
ranges = list(range(0, len(options)))
# get categorical codes
categories = dict(zip(options,ranges))
categories
{'77': 0,
'23': 1,
'24': 2,
'65': 3,
'66': 4,
'67': 5,
'25': 6,
'27': 7,
'29': 8,
'30': 9,
'8': 10,
'9': 11}
## map categories onto Groups
megadf['Group'] = megadf['Group'].map(str)
megadf['Group'] = megadf['Group'].map(categories)
megadf
megadf['Group'] = megadf['Group'].map(str)
megadf['Option'] = megadf['Option'].map(str)
#megadf['Option'] = megadf['Group'] + ' ' + megadf['Option']
# DF for sankey
BNdf = megadf
BNdf
| id | Description | Option | Variable | Group | |
|---|---|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score | Class_X | 0 |
| 1 | 1 | GK Score | High GK Score | Class_X | 0 |
| 2 | 3 | GK Score | High GK Score | Class_X | 0 |
| 3 | 5 | GK Score | Low GK Score | Class_X | 0 |
| 4 | 14 | GK Score | Low GK Score | Class_X | 0 |
| ... | ... | ... | ... | ... | ... |
| 31395 | 1751 | Genetic science can contribute to the followin... | Negative | LE3.085 | 11 |
| 31396 | 1754 | Genetic science can contribute to the followin... | None | LE3.085 | 11 |
| 31397 | 1776 | Genetic science can contribute to the followin... | None | LE3.085 | 11 |
| 31398 | 1777 | Genetic science can contribute to the followin... | None | LE3.085 | 11 |
| 31399 | 1836 | Genetic science can contribute to the followin... | None | LE3.085 | 11 |
31400 rows × 5 columns
metadata.Group = metadata['Group'].apply(str)
cat_select = metadata[metadata['Group'].isin(['8', '9'])]
cat_select.head(3)
| Variable | Description | Option | Group | Composite | Tag | |
|---|---|---|---|---|---|---|
| 27 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure | 8 | Special | HR |
| 28 | LE3.046 | Please indicate whether the following endeavou... | Improving fairness in criminal trials | 8 | Special | HR |
| 29 | LE3.047 | Please indicate whether the following endeavou... | Economic growth | 8 | Special | HR |
specialdf.head(3)
| id | Progress | UserLanguage | Collection | value | Variable | Description | Option | Group | Composite | Tag | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 262629 | 0 | 100 | RU | Pilot | Positive | LE3.045 | Please indicate whether the following endeavou... | Positive | 8 | Special | HR |
| 262630 | 1 | 100 | RU | Pilot | Positive | LE3.045 | Please indicate whether the following endeavou... | Positive | 8 | Special | HR |
| 262632 | 3 | 100 | RU | Pilot | Positive | LE3.045 | Please indicate whether the following endeavou... | Positive | 8 | Special | HR |
nspecialdf = pd.merge(cat_select, specialdf, on='Variable')
nspecialdf['Option'] = nspecialdf['Option_x']+' '+nspecialdf['value']
nspecialdf.head(3)
| Variable | Description_x | Option_x | Group_x | Composite_x | Tag_x | id | Progress | UserLanguage | Collection | value | Description_y | Option_y | Group_y | Composite_y | Tag_y | Option | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure | 8 | Special | HR | 0 | 100 | RU | Pilot | Positive | Please indicate whether the following endeavou... | Positive | 8 | Special | HR | Disease prevention and cure Positive |
| 1 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure | 8 | Special | HR | 1 | 100 | RU | Pilot | Positive | Please indicate whether the following endeavou... | Positive | 8 | Special | HR | Disease prevention and cure Positive |
| 2 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure | 8 | Special | HR | 3 | 100 | RU | Pilot | Positive | Please indicate whether the following endeavou... | Positive | 8 | Special | HR | Disease prevention and cure Positive |
sdf = nspecialdf[['id', 'Variable', 'Description_x', 'Option', 'Group_x']]
sdf.columns = ['id','Variable', 'Description', 'Option', 'Group'].copy()
sdf.head(5)
| id | Variable | Description | Option | Group | |
|---|---|---|---|---|---|
| 0 | 0 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| 1 | 1 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| 2 | 3 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| 3 | 5 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| 4 | 6 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
maindf = megadf[megadf['Variable'] == 'Class_X']
maindf.drop('Variable', axis=1, inplace=True)
maindf.drop('Group', axis=1, inplace=True)
maindf
| id | Description | Option | |
|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score |
| 1 | 1 | GK Score | High GK Score |
| 2 | 3 | GK Score | High GK Score |
| 3 | 5 | GK Score | Low GK Score |
| 4 | 14 | GK Score | Low GK Score |
| ... | ... | ... | ... |
| 6952 | 1875 | Genetic Curiosity | Low Genetic Curiosity |
| 6953 | 1885 | Genetic Curiosity | Medium Genetic Curiosity |
| 6954 | 1886 | Genetic Curiosity | Medium Genetic Curiosity |
| 6955 | 1887 | Genetic Curiosity | Medium Genetic Curiosity |
| 6956 | 1888 | Genetic Curiosity | Low Genetic Curiosity |
6957 rows × 3 columns
new_df = sdf[sdf['Group']=='8']
new_df
| id | Variable | Description | Option | Group | |
|---|---|---|---|---|---|
| 0 | 0 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| 1 | 1 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| 2 | 3 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| 3 | 5 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| 4 | 6 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| ... | ... | ... | ... | ... | ... |
| 3827 | 264 | LE3.064 | Please indicate whether the following endeavou... | Other None | 8 |
| 3828 | 266 | LE3.064 | Please indicate whether the following endeavou... | Other None | 8 |
| 3829 | 276 | LE3.064 | Please indicate whether the following endeavou... | Other None | 8 |
| 3830 | 310 | LE3.064 | Please indicate whether the following endeavou... | Other None | 8 |
| 3831 | 317 | LE3.064 | Please indicate whether the following endeavou... | Other None | 8 |
3832 rows × 5 columns
nnmegadf = reduce(lambda x,y: pd.merge(x,y, on='id', how='outer'), [maindf, new_df])
nnmegadf = nnmegadf.dropna()
nnnmegadf = pd.DataFrame(nnmegadf).reset_index()
nnmegadf['id'] = 1
nnmegadf.dropna(subset=['Option_y'], inplace=True)
nnmegadf.dropna(subset=['Option_x'], inplace=True)
nnmegadf = nnmegadf[nnmegadf['Option_y'] != ' ']
cb_xn_fin = nnmegadf.groupby(['Option_x', 'Option_y'])['id'].sum().reset_index()
naindf = pd.DataFrame(maindf).reset_index()
naindf.id = 1
naindf.columns=['index', 'count','Description', 'Option_x']
naindf = naindf.groupby('Option_x')['count'].sum().reset_index()
new_cb_xn_fin = pd.merge(cb_xn_fin, naindf, on='Option_x')
new_cb_xn_fin
cb_xn = new_cb_xn_fin
cb_xf = cb_xn
cb_xf['prop'] = (cb_xf['id']/cb_xf['count']).round(3)
cb_xn_fin.head(2)
| Option_x | Option_y | id | |
|---|---|---|---|
| 0 | Female Participants | Altering own traits aka biohacking Negative | 10 |
| 1 | Female Participants | Altering own traits aka biohacking None | 11 |
df = cb_xn
df = df.pivot(index='Option_x', columns='Option_y', values='id')
df = df.fillna(0)
import plotly.express as px
fig = px.imshow(df, text_auto=True, aspect="auto")
fig.update_xaxes(
tickangle = 45,
title_text = "Option",
title_font = {"size": 14},
title_standoff = 25,
tickmode='linear')
fig.update_yaxes(
title_text = "Absolute Count of Selected Option",
title_standoff = 50,
tickmode='linear')
fig.update_layout(
title="Variable LE3.066-085: Genetic science can contribute to the following social changes. <br>Indicate whether you consider these endeavours positive neutral or negative for society (N={})".format(len(nnnmegadf.id.unique())),
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
import dash_bio
fig = dash_bio.Clustergram(
data=df,
column_labels=list(df.columns.values),
row_labels=list(df.index),
height=720,
width=1080
)
for template in ["seaborn"]:
fig.update_layout(template=template)
fig.write_html("/home/manu10/Downloads/iglas_work/non_hard_select_all_gr_relations.html")
fig.show()
df = cb_xf
df = df.pivot(index='Option_x', columns='Option_y', values='prop')
df = df.fillna(0)
import plotly.express as px
fig = px.imshow(df, text_auto=True, aspect="auto")
fig.update_xaxes(
tickangle = 45,
title_text = "Option",
title_font = {"size": 14},
title_standoff = 25,
tickmode='linear')
fig.update_yaxes(
title_text = "Proportional Count of Selected Option <br> (Positive + Negative + None = 1, for each item)",
title_standoff = 50,
tickmode='linear')
fig.update_layout(
title="Variable LE3.066-085: Genetic science can contribute to the following social changes. <br>Indicate whether you consider these endeavours positive neutral or negative for society (N={})".format(len(nnnmegadf.id.unique())),
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
# plot heatmap
data = df
fig, ax = plt.subplots(figsize=(40,10), dpi = 300)
ax = sns.heatmap(data, annot=True, linewidths=.5)
# turn the axis label
for item in ax.get_yticklabels():
item.set_rotation(0)
item.set_va('center')
for item in ax.get_xticklabels():
item.set_rotation(90)
# save figure
#plt.savefig('seabornPandas.png', dpi=300)
plt.show()
import dash_bio
fig = dash_bio.Clustergram(
data=df,
column_labels=list(df.columns.values),
row_labels=list(df.index),
height=720,
width=1080
)
for template in ["seaborn"]:
fig.update_layout(template=template)
fig.write_html("/home/manu10/Downloads/iglas_work/non_hard_select_all_gr_relations.html")
fig.show()
cb_xf.sort_values("prop", inplace=True)
fig_high = px.bar(cb_xf, x="Option_y", color="prop",
y='prop',
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x'
)
fig_high.update_layout(
title="Variable LE3.066-085: Genetic science can contribute to the following social changes. <br>Indicate whether you consider these endeavours positive neutral or negative for society (N={})".format(len(nnnmegadf.id.unique()),
xaxis_title="Category",
yaxis_title="Sum of all proportions <br> (range[0:1] for each item for each group)",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
))
fig_high.update_traces(showlegend=False)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear'
)
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
xf_new = cb_xf
u = xf_new['Option_y'].str.partition()
un = pd.DataFrame({'Opinion': u[2].str.split().str[-1]})
xf_new['Option_z'] = xf_new['Option_y'].str.rsplit(' ',1).str[0]
xf_new['Opinion'] = un['Opinion']
xf_new.sort_values(["prop", "Opinion"], inplace=True)
fig_high = px.bar(cb_xf, x="Option_z", color="prop",
y='prop',
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x',
facet_row='Opinion'
)
fig_high.update_layout(
title="Variable LE3.045-064: Please indicate whether the following endeavours have <br>positive negative or no impact on society (N={})".format(len(nnnmegadf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions <br> (range[0:1] for each item for each group)",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig_high.update_traces(showlegend=True)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear'
)
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
# hide subplot y-axis titles and x-axis titles
for axis in fig_high.layout:
if type(fig_high.layout[axis]) == go.layout.YAxis:
fig_high.layout[axis].title.text = 'Sum of all proportions <br> (range[0:1] for each item for each group)'
if type(fig_high.layout[axis]) == go.layout.XAxis:
fig_high.layout[axis].title.text = 'Item + Opinion'
# ensure that each chart has its own y rage and tick labels
fig_high.update_yaxes(matches=None, showticklabels=True, visible=True)
fig_high = px.bar(cb_xf, x="Option_y", color="Option_x",
y='prop',
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x'
)
fig_high.update_layout(
title="Variable LE3.066-085: Genetic science can contribute to the following social changes. <br>Indicate whether you consider these endeavours positive neutral or negative for society (N={})".format(len(nnnmegadf.id.unique()),
xaxis_title="Category",
yaxis_title="Sum of all proportions",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
))
fig_high.update_traces(showlegend=True)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear'
)
fig.update_xaxes(
title_text = "Items",
title_standoff = 50,
tickmode='linear')
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
# filter for less than 5%
xf_new = cb_xf
#xf_new = cb_xf[cb_xf['prop']>=0.05]
# colour by opinion type
u = xf_new['Option_y'].str.partition()
un = pd.DataFrame({'Opinion': u[2].str.split().str[-1]})
xf_new['Option_z'] = xf_new['Option_y'].str.rsplit(' ',1).str[0]
xf_new['Opinion'] = un['Opinion']
xf_new
| Option_x | Option_y | id | count | prop | Option_z | Opinion | |
|---|---|---|---|---|---|---|---|
| 880 | Younger Participants | Increased Longevity Negative | 1 | 599 | 0.002 | Increased Longevity | Negative |
| 21 | Female Participants | Increased Longevity Negative | 1 | 497 | 0.002 | Increased Longevity | Negative |
| 377 | Low GK Score | Personalised education Negative | 1 | 496 | 0.002 | Personalised education | Negative |
| 889 | Younger Participants | Reducing food scarcity Negative | 1 | 599 | 0.002 | Reducing food scarcity | Negative |
| 18 | Female Participants | Improving fairness in criminal trials Negative | 1 | 497 | 0.002 | Improving fairness in criminal trials | Negative |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 722 | Older Participants | Improving fairness in criminal trials Positive | 24 | 174 | 0.138 | Improving fairness in criminal trials | Positive |
| 675 | Not Students | New medical technologies Positive | 60 | 424 | 0.142 | New medical technologies | Positive |
| 664 | Not Students | Disease prevention and cure Positive | 62 | 424 | 0.146 | Disease prevention and cure | Positive |
| 718 | Older Participants | Disease prevention and cure Positive | 27 | 174 | 0.155 | Disease prevention and cure | Positive |
| 729 | Older Participants | New medical technologies Positive | 27 | 174 | 0.155 | New medical technologies | Positive |
910 rows × 7 columns
fig = px.bar(xf_new, x='Option_y', y='prop', height=1080, color='Opinion')
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(
title="Variable LE3.066-085: Genetic science can contribute to the following social changes. <br>Indicate whether you consider these endeavours positive neutral or negative for society (N={})".format(len(nnnmegadf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig.update_yaxes(
title_text = "Sum of all proportions <br> (range[0:1], for each grouping)",
title_standoff = 50,
tickmode='linear')
fig.update_xaxes(
title_text = "Items",
title_standoff = 50,
tickmode='linear')
fig.show()
fig = px.bar(xf_new, x='Option_z', y='prop', height=1080, color='Opinion')
fig.update_layout(
title="Variable LE3.066-085: Genetic science can contribute to the following social changes. <br>Indicate whether you consider these endeavours positive neutral or negative for society (N={})".format(len(nnnmegadf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig.update_yaxes(
title_text = "Sum of all proportions <br> (range[0:1], for each grouping)",
title_standoff = 50,
tickmode='linear')
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.show()
metadata.Group = metadata['Group'].apply(str)
cat_select = metadata[metadata['Group'].isin(['8', '9'])]
nspecialdf = pd.merge(cat_select, specialdf, on='Variable')
nspecialdf['Option'] = nspecialdf['Option_x']+' '+nspecialdf['value']
sdf = nspecialdf[['id', 'Variable', 'Description_x', 'Option', 'Group_x']]
sdf.columns = ['id','Variable', 'Description', 'Option', 'Group'].copy()
maindf = megadf[megadf['Variable'] == 'Class_X']
maindf.drop('Variable', axis=1, inplace=True)
maindf.drop('Group', axis=1, inplace=True)
new_df = sdf[sdf['Group']=='9']
nnmegadf = reduce(lambda x,y: pd.merge(x,y, on='id', how='outer'), [maindf, new_df])
nnmegadf = nnmegadf.dropna()
nnnmegadf = pd.DataFrame(nnmegadf).reset_index()
nnmegadf['id'] = 1
nnmegadf.dropna(subset=['Option_y'], inplace=True)
nnmegadf.dropna(subset=['Option_x'], inplace=True)
nnmegadf = nnmegadf[nnmegadf['Option_y'] != ' ']
cb_xn_fin = nnmegadf.groupby(['Option_x', 'Option_y'])['id'].sum().reset_index()
naindf = pd.DataFrame(maindf).reset_index()
naindf.id = 1
naindf.columns=['index', 'count','Description', 'Option_x']
naindf = naindf.groupby('Option_x')['count'].sum().reset_index()
new_cb_xn_fin = pd.merge(cb_xn_fin, naindf, on='Option_x')
new_cb_xn_fin
cb_xn = new_cb_xn_fin
cb_xf = cb_xn
cb_xf['prop'] = (cb_xf['id']/cb_xf['count']).round(3)
####################
import plotly.express as px
fig = px.imshow(df, text_auto=True, aspect="auto", height=1080)
fig.update_xaxes(
tickangle = 45,
title_text = "Option",
title_font = {"size": 14},
title_standoff = 25,
tickmode='linear')
fig.update_yaxes(
title_text = "Absolute Count of Selected Option",
title_standoff = 50,
tickmode='linear')
fig.update_layout(
title="Variable LE3.045-064: Please indicate whether the following endeavours have <br>positive negative or no impact on society (N={})".format(len(nnnmegadf.id.unique())),
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
import dash_bio
fig = dash_bio.Clustergram(
data=df,
column_labels=list(df.columns.values),
row_labels=list(df.index),
height=720,
width=1080
)
for template in ["seaborn"]:
fig.update_layout(template=template)
fig.write_html("/home/manu10/Downloads/iglas_work/non_hard_select_all_gr_relations.html")
fig.show()
df = cb_xf
df = df.pivot(index='Option_x', columns='Option_y', values='prop')
df = df.fillna(0)
import plotly.express as px
fig = px.imshow(df, text_auto=True, aspect="auto")
fig.update_xaxes(
tickangle = 45,
title_text = "Option",
title_font = {"size": 14},
title_standoff = 25,
tickmode='linear')
fig.update_yaxes(
title_text = "Proportional Count of Selected Option <br> (Positive + Negative + None = 1, for each item)",
title_standoff = 50,
tickmode='linear')
fig.update_layout(
title="Variable LE3.045-064: Please indicate whether the following endeavours have <br>positive negative or no impact on society (N={})".format(len(nnnmegadf.id.unique())),
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
# plot heatmap
data = df
fig, ax = plt.subplots(figsize=(40,10), dpi = 300)
ax = sns.heatmap(data, annot=True, linewidths=.5)
# turn the axis label
for item in ax.get_yticklabels():
item.set_rotation(0)
item.set_va('center')
for item in ax.get_xticklabels():
item.set_rotation(90)
# save figure
#plt.savefig('seabornPandas.png', dpi=300)
plt.show()
import dash_bio
fig = dash_bio.Clustergram(
data=df,
column_labels=list(df.columns.values),
row_labels=list(df.index),
height=720,
width=1080
)
for template in ["seaborn"]:
fig.update_layout(template=template)
fig.write_html("/home/manu10/Downloads/iglas_work/non_hard_select_all_gr_relations.html")
fig.show()
cb_xf.sort_values("prop", inplace=True)
fig_high = px.bar(cb_xf, x="Option_y", color="prop",
y='prop',
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x'
)
fig_high.update_layout(
title="Variable LE3.045-064: Please indicate whether the following endeavours have <br>positive negative or no impact on society (N={})".format(len(nnnmegadf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions <br> (range[0:1] for each item for each group)",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig_high.update_traces(showlegend=False)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear'
)
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
xf_new
| Option_x | Option_y | id | count | prop | Option_z | Opinion | |
|---|---|---|---|---|---|---|---|
| 880 | Younger Participants | Increased Longevity Negative | 1 | 599 | 0.002 | Increased Longevity | Negative |
| 21 | Female Participants | Increased Longevity Negative | 1 | 497 | 0.002 | Increased Longevity | Negative |
| 377 | Low GK Score | Personalised education Negative | 1 | 496 | 0.002 | Personalised education | Negative |
| 889 | Younger Participants | Reducing food scarcity Negative | 1 | 599 | 0.002 | Reducing food scarcity | Negative |
| 18 | Female Participants | Improving fairness in criminal trials Negative | 1 | 497 | 0.002 | Improving fairness in criminal trials | Negative |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 722 | Older Participants | Improving fairness in criminal trials Positive | 24 | 174 | 0.138 | Improving fairness in criminal trials | Positive |
| 675 | Not Students | New medical technologies Positive | 60 | 424 | 0.142 | New medical technologies | Positive |
| 664 | Not Students | Disease prevention and cure Positive | 62 | 424 | 0.146 | Disease prevention and cure | Positive |
| 718 | Older Participants | Disease prevention and cure Positive | 27 | 174 | 0.155 | Disease prevention and cure | Positive |
| 729 | Older Participants | New medical technologies Positive | 27 | 174 | 0.155 | New medical technologies | Positive |
910 rows × 7 columns
xf_new = cb_xf
u = xf_new['Option_y'].str.partition()
un = pd.DataFrame({'Opinion': u[2].str.split().str[-1]})
xf_new['Option_z'] = xf_new['Option_y'].str.rsplit(' ',1).str[0]
xf_new['Opinion'] = un['Opinion']
xf_new.sort_values(["prop", "Opinion"], inplace=True)
fig_high = px.bar(cb_xf, x="Option_z", color="prop",
y='prop',
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x',
facet_row='Opinion'
)
fig_high.update_layout(
title="Variable LE3.045-064: Please indicate whether the following endeavours have <br>positive negative or no impact on society (N={})".format(len(nnnmegadf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions <br> (range[0:1] for each item for each group)",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig_high.update_traces(showlegend=False)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear'
)
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
# hide subplot y-axis titles and x-axis titles
for axis in fig_high.layout:
if type(fig_high.layout[axis]) == go.layout.YAxis:
fig_high.layout[axis].title.text = 'Sum of all proportions <br> (range[0:1] for each item for each group)'
if type(fig_high.layout[axis]) == go.layout.XAxis:
fig_high.layout[axis].title.text = 'Item + Opinion'
# ensure that each chart has its own y rage and tick labels
fig_high.update_yaxes(matches=None, showticklabels=True, visible=True)
xf_new = cb_xf
u = xf_new['Option_y'].str.partition()
un = pd.DataFrame({'Opinion': u[2].str.split().str[-1]})
xf_new['Option_z'] = xf_new['Option_y'].str.rsplit(' ',1).str[0]
xf_new['Opinion'] = un['Opinion']
xf_new.sort_values(["prop", "Opinion"], inplace=True)
fig_high = px.bar(cb_xf, x="Option_z", color="Option_x",
y='prop',
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x',
facet_row='Opinion'
)
fig_high.update_layout(
title="Variable LE3.045-064: Please indicate whether the following endeavours have <br>positive negative or no impact on society (N={})".format(len(nnnmegadf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig_high.update_traces(showlegend=True)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear'
)
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
# hide subplot y-axis titles and x-axis titles
for axis in fig_high.layout:
if type(fig_high.layout[axis]) == go.layout.YAxis:
fig_high.layout[axis].title.text = 'Sum of all proportions <br> (range[0:1] for each item for each group)'
if type(fig_high.layout[axis]) == go.layout.XAxis:
fig_high.layout[axis].title.text = 'Item + Opinion'
# ensure that each chart has its own y rage and tick labels
fig_high.update_yaxes(matches=None, showticklabels=True, visible=True)
# filter for less than 5%
xf_new = cb_xf
#xf_new = cb_xf[cb_xf['prop']>=0.05]
# colour by opinion type
u = xf_new['Option_y'].str.partition()
un = pd.DataFrame({'Opinion': u[2].str.split().str[-1]})
xf_new['Option_z'] = xf_new['Option_y'].str.rsplit(' ',1).str[0]
xf_new['Opinion'] = un['Opinion']
fig = px.bar(xf_new, x='Option_y', y='prop', height=1080, color='Opinion')
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(
title="Variable LE3.045-064: Please indicate whether the following endeavours have <br>positive negative or no impact on society (N={})".format(len(nnnmegadf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig.update_yaxes(
title_text = "Sum of all proportions <br> (range[0:1], for each grouping)",
title_standoff = 50,
tickmode='linear')
fig.update_xaxes(
title_text = "Items",
title_standoff = 50,
tickmode='linear')
fig.show()
fig = px.bar(xf_new, x='Option_z', y='prop', height=1080, color='Opinion')
fig.update_layout(
title="Variable LE3.045-064: Please indicate whether the following endeavours have <br>positive negative or no impact on society (N={})".format(len(nnnmegadf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=10,
color="RebeccaPurple"
),
barmode="stack",
)
fig.update_yaxes(
title_text = "Sum of all proportions <br> (range[0:1], for each grouping)",
title_standoff = 50,
tickmode='linear')
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.show()
maindf = megadf[megadf['Variable'] == 'Class_X']
maindf.drop('Variable', axis=1, inplace=True)
maindf.drop('Group', axis=1, inplace=True)
maindf
| id | Description | Option | |
|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score |
| 1 | 1 | GK Score | High GK Score |
| 2 | 3 | GK Score | High GK Score |
| 3 | 5 | GK Score | Low GK Score |
| 4 | 14 | GK Score | Low GK Score |
| ... | ... | ... | ... |
| 6952 | 1875 | Genetic Curiosity | Low Genetic Curiosity |
| 6953 | 1885 | Genetic Curiosity | Medium Genetic Curiosity |
| 6954 | 1886 | Genetic Curiosity | Medium Genetic Curiosity |
| 6955 | 1887 | Genetic Curiosity | Medium Genetic Curiosity |
| 6956 | 1888 | Genetic Curiosity | Low Genetic Curiosity |
6957 rows × 3 columns
list_ids = list(maindf.id.unique())
ndf_27['id'] =ndf_27['id'].apply(int)
con_27 = ndf_27[ndf_27['id'].isin(list_ids)]
con_27
| index | id | Description | Option | Variable | Group | |
|---|---|---|---|---|---|---|
| 8846 | 22858 | 1 | What concerns do participants have in relation... | Do not know who will have access to that infor... | LE2.122 | 27 |
| 8847 | 22859 | 3 | What concerns do participants have in relation... | Do not know who will have access to that infor... | LE2.122 | 27 |
| 8852 | 22864 | 14 | What concerns do participants have in relation... | Do not know who will have access to that infor... | LE2.122 | 27 |
| 8882 | 22894 | 131 | What concerns do participants have in relation... | Do not know who will have access to that infor... | LE2.122 | 27 |
| 8884 | 22896 | 134 | What concerns do participants have in relation... | Do not know who will have access to that infor... | LE2.122 | 27 |
| ... | ... | ... | ... | ... | ... | ... |
| 12732 | 26744 | 1271 | What concerns do participants have in relation... | Other | LE2.130 | 27 |
| 12734 | 26746 | 1346 | What concerns do participants have in relation... | Other | LE2.130 | 27 |
| 12736 | 26748 | 1499 | What concerns do participants have in relation... | Other | LE2.130 | 27 |
| 12738 | 26750 | 1602 | What concerns do participants have in relation... | Other | LE2.130 | 27 |
| 12739 | 26751 | 1645 | What concerns do participants have in relation... | Other | LE2.130 | 27 |
2088 rows × 6 columns
nspecialdf = reduce(lambda x,y: pd.merge(x,y, on='id', how='outer'), [maindf, con_27])
nspecialdf['Option'] = nspecialdf['Option_x']+' '+nspecialdf['Option_y']
nspecialdf.head(5)
| id | Description_x | Option_x | index | Description_y | Option_y | Variable | Group | Option | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score | 26686 | What concerns do participants have in relation... | Other | LE2.130 | 27 | Low GK Score Other |
| 1 | 0 | Gender | Female Participants | 26686 | What concerns do participants have in relation... | Other | LE2.130 | 27 | Female Participants Other |
| 2 | 0 | Age | Older Participants | 26686 | What concerns do participants have in relation... | Other | LE2.130 | 27 | Older Participants Other |
| 3 | 0 | Confidence in GK | Low GK Confidence | 26686 | What concerns do participants have in relation... | Other | LE2.130 | 27 | Low GK Confidence Other |
| 4 | 0 | Related/ Not related to law | Participants not related to law | 26686 | What concerns do participants have in relation... | Other | LE2.130 | 27 | Participants not related to law Other |
nnmegadf = nspecialdf.dropna()
nnnmegadf = pd.DataFrame(nnmegadf).reset_index()
nnmegadf['id'] = 1
nnmegadf.dropna(subset=['Option_y'], inplace=True)
nnmegadf.dropna(subset=['Option_x'], inplace=True)
nnmegadf = nnmegadf[nnmegadf['Option_y'] != ' ']
cb_xn_fin = nnmegadf.groupby(['Option_x', 'Option_y'])['id'].sum().reset_index()
naindf = pd.DataFrame(maindf).reset_index()
naindf.id = 1
naindf.columns=['index', 'count','Description', 'Option_x']
naindf = naindf.groupby('Option_x')['count'].sum().reset_index()
new_cb_xn_fin = pd.merge(cb_xn_fin, naindf, on='Option_x')
new_cb_xn_fin
cb_xn = new_cb_xn_fin
cb_xf = cb_xn
cb_xf['prop'] = (cb_xf['id']/cb_xf['count']).round(3)
df = cb_xn
df = df.pivot(index='Option_x', columns='Option_y', values='id')
df = df.fillna(0)
####################
import plotly.express as px
fig = px.imshow(df, text_auto=True, aspect="auto", height =1080, width=1720)
fig.update_xaxes(
tickangle = 45,
title_text = "Option",
title_font = {"size": 14},
title_standoff = 25,
tickmode='linear')
fig.update_yaxes(
title_text = "Absolute Count of Selected Option",
title_standoff = 50,
tickmode='linear')
fig.update_layout(
title="Variables LE2.122-130: What concerns do participants have in relation to genetic testing <br> (N={})".format(len(nnnmegadf.id.unique())),
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
df = cb_xf
df = df.pivot(index='Option_x', columns='Option_y', values='prop')
df = df.fillna(0)
####################
import plotly.express as px
fig = px.imshow(df, text_auto=True, aspect="auto", height =1080, width=1720)
fig.update_xaxes(
tickangle = 45,
title_text = "Option",
title_font = {"size": 14},
title_standoff = 25,
tickmode='linear')
fig.update_yaxes(
title_text = "Proportion of Selected Option across categories",
title_standoff = 50,
tickmode='linear')
fig.update_layout(
title="Variables LE2.122-130: What concerns do participants have in relation to genetic testing <br> (N={})".format(len(nnnmegadf.id.unique())),
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
import dash_bio
fig = dash_bio.Clustergram(
data=df,
column_labels=list(df.columns.values),
row_labels=list(df.index),
height=1080,
width=1080
)
for template in ["seaborn"]:
fig.update_layout(template=template)
fig.write_html("/home/manu10/Downloads/iglas_work/non_hard_select_all_gr_relations.html")
fig.update_xaxes(
tickangle = 45,)
fig.update_layout(
title="Variables LE2.122-130: What concerns do participants have in relation to genetic testing <br> (N={})".format(len(nnnmegadf.id.unique())),
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig.show()
cb_xf.sort_values("prop", inplace=True)
fig_high = px.bar(cb_xf, x="Option_y", color="prop",
y='prop',
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x'
)
fig_high.update_layout(
title="Variables LE2.122-130: What concerns do participants have in relation to genetic testing <br> (N={})".format(len(nnnmegadf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all counts <br> (range varies with group size and counts)",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig_high.update_traces(showlegend=False)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear',
tickangle = 45
)
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
cb_xf.sort_values("prop", inplace=True)
fig_high = px.bar(cb_xf, x="Option_y", color="Option_x",
y='prop',
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x'
)
fig_high.update_layout(
title="Variables LE2.122-130: What concerns do participants have in relation to genetic testing <br> (N={})".format(len(nnnmegadf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all counts <br> (range varies with group size and counts)",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig_high.update_traces(showlegend=True)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear',
tickangle = 45
)
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
cb_xf.sort_values("prop", inplace=True)
fig_high = px.bar(cb_xn, x="Option_y", color="Option_x",
y='id',
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x'
)
fig_high.update_layout(
title="Variables LE2.122-130: What concerns do participants have in relation to genetic testing <br> (N={})".format(len(nnnmegadf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all counts <br> (range varies with group size and counts)",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig_high.update_traces(showlegend=True)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear',
tickangle = 45
)
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
cb_xf
| Option_x | Option_y | id | count | prop | |
|---|---|---|---|---|---|
| 17 | High Concern | Other | 1 | 224 | 0.004 |
| 161 | Participants related to law | Other | 3 | 270 | 0.011 |
| 107 | Medium Concern | Other | 4 | 348 | 0.011 |
| 53 | Law Students | Other | 3 | 269 | 0.011 |
| 60 | Low Concern | I would not want to be labelled as having any ... | 4 | 201 | 0.020 |
| ... | ... | ... | ... | ... | ... |
| 128 | Not Students | I am concerned my data will be used for other ... | 304 | 424 | 0.717 |
| 10 | High Concern | Do not know who will have access to that infor... | 197 | 224 | 0.879 |
| 9 | High Concern | Do not know whether the data will be stored se... | 198 | 224 | 0.884 |
| 13 | High Concern | I am worried some information about my physica... | 214 | 224 | 0.955 |
| 11 | High Concern | I am concerned my data will be used for other ... | 214 | 224 | 0.955 |
180 rows × 5 columns
cb_xf.sort_values(["prop", "count"], inplace=True)
fig_high = px.bar(cb_xf, x="Option_y", color="prop",
y='Option_x',
title="Total opinions across categories",
barmode='stack',
height=2080, width=1720,
text='Option_x'
)
fig.update_layout(
title="Variables LE2.122-130: What concerns do participants have in relation to genetic testing <br> (N={})".format(len(nnnmegadf.id.unique())),
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig_high.update_traces(showlegend=True)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear'
)
for axis in fig_high.layout:
if type(fig_high.layout[axis]) == go.layout.XAxis:
fig_high.layout[axis].title.text = ''
fig_high.update_layout(
# keep the original annotations and add a list of new annotations:
annotations = list(fig_high.layout.annotations) +
[go.layout.Annotation(
x=-0.07,
y=0.5,
font=dict(
size=14
),
showarrow=False,
text="",
textangle=-90,
xref="paper",
yref="paper"
)
]
)
fig_high.update_layout(yaxis={'visible': True, 'showticklabels': False})
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear',
tickangle = 45
)
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
xdf = new_large_df
## Filters no null values for options and groups 35 and 37
xdf["Group"] = xdf["Group"].map(str)
filter = xdf["Group"] != ' '
xdf = xdf[filter]
select = ['24']
xdf = xdf[xdf['Group'].isin(select)]
xdf['Option'] = xdf['Option']+' - '+xdf['value']
xdf = xdf.drop('Composite', axis=1)
xdf = xdf.drop('Progress', axis=1)
xdf = xdf.drop('UserLanguage', axis=1)
xdf = xdf.drop('Collection', axis=1)
xdf = xdf.drop('Variable', axis=1)
xdf = xdf.drop('Tag', axis=1)
xdf = xdf.drop('value', axis=1)
xdf = xdf.drop('level_0', axis=1)
xdf = xdf.drop('index', axis=1)
xdf.head(3)
| id | Description | Option | Group | |
|---|---|---|---|---|
| 33326 | 0 | Would you be interested in finding out about g... | Future spouse or partner - Most Likely | 24 |
| 33327 | 1 | Would you be interested in finding out about g... | Future spouse or partner - Under certain circu... | 24 |
| 33328 | 3 | Would you be interested in finding out about g... | Future spouse or partner - Definitely | 24 |
ndf = pd.merge(maindf, xdf, on='id')
ndf.head(3)
| id | Description_x | Option_x | Description_y | Option_y | Group | |
|---|---|---|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score | Would you be interested in finding out about g... | Future spouse or partner - Most Likely | 24 |
| 1 | 0 | GK Score | Low GK Score | Would you be interested in finding out about g... | Spouse or partner - Most Likely | 24 |
| 2 | 0 | GK Score | Low GK Score | Would you be interested in finding out about g... | Children - Definitely | 24 |
nndf= pd.DataFrame(ndf).reset_index()
ndf['id'] = 1
ndf.dropna(subset=['Option_y'], inplace=True)
ndf.dropna(subset=['Option_x'], inplace=True)
ndf = ndf[ndf['Option_y'] != ' ']
cb_xn_fin = ndf.groupby(['Option_x', 'Option_y','Description_y'])['id'].sum().reset_index()
cb_xn_fin
| Option_x | Option_y | Description_y | id | |
|---|---|---|---|---|
| 0 | Female Participants | Children - Definitely | Would you be interested in finding out about g... | 178 |
| 1 | Female Participants | Children - Most Likely | Would you be interested in finding out about g... | 148 |
| 2 | Female Participants | Children - Never | Would you be interested in finding out about g... | 25 |
| 3 | Female Participants | Children - Under certain circumstances | Would you be interested in finding out about g... | 126 |
| 4 | Female Participants | Friends - Definitely | Would you be interested in finding out about g... | 3 |
| ... | ... | ... | ... | ... |
| 548 | Younger Participants | Siblings - Under certain circumstances | Would you be interested in finding out about g... | 215 |
| 549 | Younger Participants | Spouse or partner - Definitely | Would you be interested in finding out about g... | 157 |
| 550 | Younger Participants | Spouse or partner - Most Likely | Would you be interested in finding out about g... | 194 |
| 551 | Younger Participants | Spouse or partner - Never | Would you be interested in finding out about g... | 40 |
| 552 | Younger Participants | Spouse or partner - Under certain circumstances | Would you be interested in finding out about g... | 148 |
553 rows × 4 columns
cb_xn_fin['total'] = 1
total = cb_xn_fin.groupby(['Option_x']).total.sum().reset_index()
cb_xn = pd.merge(cb_xn_fin, total, on='Option_x')
naindf = pd.DataFrame(maindf).reset_index()
naindf.id = 1
naindf.columns=['index', 'count','Description', 'Option_x']
naindf = naindf.groupby('Option_x')['count'].sum().reset_index()
new_cb_xn_fin = pd.merge(cb_xn_fin, naindf, on='Option_x')
new_cb_xn_fin
| Option_x | Option_y | Description_y | id | total | count | |
|---|---|---|---|---|---|---|
| 0 | Female Participants | Children - Definitely | Would you be interested in finding out about g... | 178 | 1 | 497 |
| 1 | Female Participants | Children - Most Likely | Would you be interested in finding out about g... | 148 | 1 | 497 |
| 2 | Female Participants | Children - Never | Would you be interested in finding out about g... | 25 | 1 | 497 |
| 3 | Female Participants | Children - Under certain circumstances | Would you be interested in finding out about g... | 126 | 1 | 497 |
| 4 | Female Participants | Friends - Definitely | Would you be interested in finding out about g... | 3 | 1 | 497 |
| ... | ... | ... | ... | ... | ... | ... |
| 548 | Younger Participants | Siblings - Under certain circumstances | Would you be interested in finding out about g... | 215 | 1 | 599 |
| 549 | Younger Participants | Spouse or partner - Definitely | Would you be interested in finding out about g... | 157 | 1 | 599 |
| 550 | Younger Participants | Spouse or partner - Most Likely | Would you be interested in finding out about g... | 194 | 1 | 599 |
| 551 | Younger Participants | Spouse or partner - Never | Would you be interested in finding out about g... | 40 | 1 | 599 |
| 552 | Younger Participants | Spouse or partner - Under certain circumstances | Would you be interested in finding out about g... | 148 | 1 | 599 |
553 rows × 6 columns
cb_xf = new_cb_xn_fin
cb_xf['prop'] = (cb_xf['id']/cb_xf['count']).round(3)
df = cb_xn
df = df.pivot(index='Option_x', columns='Option_y', values='id')
df = df.fillna(0)
import plotly.express as px
fig = px.imshow(df, text_auto=True, aspect="auto")
fig.update_xaxes(
tickangle = 45,
title_text = "Option",
title_font = {"size": 14},
title_standoff = 25,
tickmode='linear')
fig.update_yaxes(
title_text = "Absolute Count of Selected Option",
title_standoff = 50,
tickmode='linear')
fig.update_layout(
title="Variable LE3.101-107: Would you be interested in finding out about genetic information (N={})".format(len(nndf.id.unique())),
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
df = cb_xf
df = df.pivot(index='Option_x', columns='Option_y', values='prop')
df = df.fillna(0)
import plotly.express as px
fig = px.imshow(df, text_auto=True, aspect="auto")
fig.update_xaxes(
tickangle = 45,
title_text = "Option",
title_font = {"size": 14},
title_standoff = 25,
tickmode='linear')
fig.update_yaxes(
title_text = "Proportion Count of Selected Option for selected Category",
title_standoff = 50,
tickmode='linear')
fig.update_layout(
title="Variable LE3.101-107: Would you be interested in finding out about genetic information (N={})".format(len(nndf.id.unique())),
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
import dash_bio
fig = dash_bio.Clustergram(
data=df,
column_labels=list(df.columns.values),
row_labels=list(df.index),
height=720,
width=1080
)
for template in ["seaborn"]:
fig.update_layout(template=template)
fig.write_html("/home/manu10/Downloads/iglas_work/non_hard_select_all_gr_relations.html")
fig.show()
cb_xf
| Option_x | Option_y | Description_y | id | total | count | prop | |
|---|---|---|---|---|---|---|---|
| 0 | Female Participants | Children - Definitely | Would you be interested in finding out about g... | 178 | 1 | 497 | 0.358 |
| 1 | Female Participants | Children - Most Likely | Would you be interested in finding out about g... | 148 | 1 | 497 | 0.298 |
| 2 | Female Participants | Children - Never | Would you be interested in finding out about g... | 25 | 1 | 497 | 0.050 |
| 3 | Female Participants | Children - Under certain circumstances | Would you be interested in finding out about g... | 126 | 1 | 497 | 0.254 |
| 4 | Female Participants | Friends - Definitely | Would you be interested in finding out about g... | 3 | 1 | 497 | 0.006 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 548 | Younger Participants | Siblings - Under certain circumstances | Would you be interested in finding out about g... | 215 | 1 | 599 | 0.359 |
| 549 | Younger Participants | Spouse or partner - Definitely | Would you be interested in finding out about g... | 157 | 1 | 599 | 0.262 |
| 550 | Younger Participants | Spouse or partner - Most Likely | Would you be interested in finding out about g... | 194 | 1 | 599 | 0.324 |
| 551 | Younger Participants | Spouse or partner - Never | Would you be interested in finding out about g... | 40 | 1 | 599 | 0.067 |
| 552 | Younger Participants | Spouse or partner - Under certain circumstances | Would you be interested in finding out about g... | 148 | 1 | 599 | 0.247 |
553 rows × 7 columns
cb_xf
| Option_x | Option_y | Description_y | id | total | count | prop | |
|---|---|---|---|---|---|---|---|
| 0 | Female Participants | Children - Definitely | Would you be interested in finding out about g... | 178 | 1 | 497 | 0.358 |
| 1 | Female Participants | Children - Most Likely | Would you be interested in finding out about g... | 148 | 1 | 497 | 0.298 |
| 2 | Female Participants | Children - Never | Would you be interested in finding out about g... | 25 | 1 | 497 | 0.050 |
| 3 | Female Participants | Children - Under certain circumstances | Would you be interested in finding out about g... | 126 | 1 | 497 | 0.254 |
| 4 | Female Participants | Friends - Definitely | Would you be interested in finding out about g... | 3 | 1 | 497 | 0.006 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 548 | Younger Participants | Siblings - Under certain circumstances | Would you be interested in finding out about g... | 215 | 1 | 599 | 0.359 |
| 549 | Younger Participants | Spouse or partner - Definitely | Would you be interested in finding out about g... | 157 | 1 | 599 | 0.262 |
| 550 | Younger Participants | Spouse or partner - Most Likely | Would you be interested in finding out about g... | 194 | 1 | 599 | 0.324 |
| 551 | Younger Participants | Spouse or partner - Never | Would you be interested in finding out about g... | 40 | 1 | 599 | 0.067 |
| 552 | Younger Participants | Spouse or partner - Under certain circumstances | Would you be interested in finding out about g... | 148 | 1 | 599 | 0.247 |
553 rows × 7 columns
xf_new = pd.DataFrame(cb_xf).reset_index()
u = xf_new['Option_y'].str.partition()
un = pd.DataFrame({'Opinion': u[2].str.split('-').str[-1]})
xf_new['Option_z'] = xf_new['Option_y'].str.rsplit('-',1).str[0]
xf_new['Opinion'] = un['Opinion']
xf_new
| index | Option_x | Option_y | Description_y | id | total | count | prop | Option_z | Opinion | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Female Participants | Children - Definitely | Would you be interested in finding out about g... | 178 | 1 | 497 | 0.358 | Children | Definitely |
| 1 | 1 | Female Participants | Children - Most Likely | Would you be interested in finding out about g... | 148 | 1 | 497 | 0.298 | Children | Most Likely |
| 2 | 2 | Female Participants | Children - Never | Would you be interested in finding out about g... | 25 | 1 | 497 | 0.050 | Children | Never |
| 3 | 3 | Female Participants | Children - Under certain circumstances | Would you be interested in finding out about g... | 126 | 1 | 497 | 0.254 | Children | Under certain circumstances |
| 4 | 4 | Female Participants | Friends - Definitely | Would you be interested in finding out about g... | 3 | 1 | 497 | 0.006 | Friends | Definitely |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 548 | 548 | Younger Participants | Siblings - Under certain circumstances | Would you be interested in finding out about g... | 215 | 1 | 599 | 0.359 | Siblings | Under certain circumstances |
| 549 | 549 | Younger Participants | Spouse or partner - Definitely | Would you be interested in finding out about g... | 157 | 1 | 599 | 0.262 | Spouse or partner | Definitely |
| 550 | 550 | Younger Participants | Spouse or partner - Most Likely | Would you be interested in finding out about g... | 194 | 1 | 599 | 0.324 | Spouse or partner | Most Likely |
| 551 | 551 | Younger Participants | Spouse or partner - Never | Would you be interested in finding out about g... | 40 | 1 | 599 | 0.067 | Spouse or partner | Never |
| 552 | 552 | Younger Participants | Spouse or partner - Under certain circumstances | Would you be interested in finding out about g... | 148 | 1 | 599 | 0.247 | Spouse or partner | Under certain circumstances |
553 rows × 10 columns
xf_new.sort_values("prop", inplace=True)
fig_high = px.bar(xf_new, x="Option_z", y='Option_x', color="prop",
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x',
facet_col='Opinion'
)
fig_high.update_layout(
title="Variable LE3.101-107: Would you be interested in finding out about genetic information (N={})".format(len(nndf.id.unique())),
xaxis_title="Options",
yaxis_title="Categories",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig_high.update_traces(showlegend=True)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear'
)
for axis in fig_high.layout:
if type(fig_high.layout[axis]) == go.layout.XAxis:
fig_high.layout[axis].title.text = ''
fig_high.update_layout(
# keep the original annotations and add a list of new annotations:
annotations = list(fig_high.layout.annotations) +
[go.layout.Annotation(
x=-0.07,
y=0.5,
font=dict(
size=14
),
showarrow=False,
text="",
textangle=-90,
xref="paper",
yref="paper"
)
]
)
fig_high.update_layout(yaxis={'visible': True, 'showticklabels': False})
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
xf_new.sort_values("prop", inplace=True)
fig_high = px.bar(xf_new, x="Option_z", color="prop",
y='prop',
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x',
facet_col='Opinion'
)
fig_high.update_layout(
title="Variable LE3.101-107: Would you be interested in finding out about genetic information (N={})".format(len(nndf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions <br> (range[0:1] for each each group across all options)",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig_high.update_traces(showlegend=False)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear'
)
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
xf_new
| index | Option_x | Option_y | Description_y | id | total | count | prop | Option_z | Opinion | |
|---|---|---|---|---|---|---|---|---|---|---|
| 234 | 234 | Low GK Score | Other - Definitely | Would you be interested in finding out about g... | 1 | 1 | 496 | 0.002 | Other | Definitely |
| 343 | 343 | Medium Genetic Curiosity | Other - Definitely | Would you be interested in finding out about g... | 1 | 1 | 327 | 0.003 | Other | Definitely |
| 316 | 316 | Medium Concern | Other - Most Likely | Would you be interested in finding out about g... | 1 | 1 | 348 | 0.003 | Other | Most Likely |
| 265 | 265 | Low Genetic Curiosity | Other relatives - Most Likely | Would you be interested in finding out about g... | 1 | 1 | 258 | 0.004 | Other relatives | Most Likely |
| 68 | 68 | High GK Confidence | Other - Definitely | Would you be interested in finding out about g... | 1 | 1 | 254 | 0.004 | Other | Definitely |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 393 | 393 | Not Students | Friends - Never | Would you be interested in finding out about g... | 252 | 1 | 424 | 0.594 | Friends | Never |
| 420 | 420 | Older Participants | Friends - Never | Would you be interested in finding out about g... | 105 | 1 | 174 | 0.603 | Friends | Never |
| 254 | 254 | Low Genetic Curiosity | Friends - Never | Would you be interested in finding out about g... | 180 | 1 | 258 | 0.698 | Friends | Never |
| 135 | 135 | High Genetic Curiosity | Spouse or partner - Definitely | Would you be interested in finding out about g... | 133 | 1 | 188 | 0.707 | Spouse or partner | Definitely |
| 112 | 112 | High Genetic Curiosity | Children - Definitely | Would you be interested in finding out about g... | 152 | 1 | 188 | 0.809 | Children | Definitely |
553 rows × 10 columns
xf_new.sort_values("prop", inplace=True)
fig_high = px.bar(xf_new, x="Option_x", color="Option_z",
y='prop',
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x',
facet_col='Opinion'
)
fig_high.update_layout(
title="Variable LE3.101-107: Would you be interested in finding out about genetic information (N={})".format(len(nndf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions <br> (range[0:1] for each each group across all options)",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig_high.update_traces(showlegend=True)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear'
)
for axis in fig_high.layout:
if type(fig_high.layout[axis]) == go.layout.XAxis:
fig_high.layout[axis].title.text = ''
fig_high.update_layout(
# keep the original annotations and add a list of new annotations:
annotations = list(fig_high.layout.annotations) +
[go.layout.Annotation(
x=-0.07,
y=0.5,
font=dict(
size=14
),
showarrow=False,
text="",
textangle=-90,
xref="paper",
yref="paper"
)
]
)
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
fig = px.bar(xf_new, x='Option_y', y='prop', height=1080, color='Opinion')
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(
title="Variable LE3.101-107: Would you be interested in finding out about genetic information (N={})".format(len(nndf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig.update_yaxes(
title_text = "Sum of all proportions <br> (range[0:1], for each grouping)",
title_standoff = 50,
tickmode='linear')
fig.update_xaxes(
title_text = "Items",
title_standoff = 50,
tickmode='linear')
fig.show()
fig = px.bar(xf_new, x='Option_z', y='prop', height=1080, color='Opinion')
fig.update_layout(
title="Variable LE3.101-107: Would you be interested in finding out about genetic information (N={})".format(len(nndf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig.update_yaxes(
title_text = "Sum of all proportions <br> (range[0:1], for each grouping)",
title_standoff = 50,
tickmode='linear')
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.show()
categories # I need groups 3, 4 and 5 for likert items and group 77 or class_X
{'77': 0,
'23': 1,
'24': 2,
'65': 3,
'66': 4,
'67': 5,
'25': 6,
'27': 7,
'29': 8,
'30': 9,
'8': 10,
'9': 11}
select = ['3', '4', '5']
select_df = megadf[megadf['Group'].isin(select)]
select_df.head(3)
| id | Description | Option | Variable | Group | |
|---|---|---|---|---|---|
| 12198 | 0 | Dissemination of genetic knowledge to the gene... | Strongly disagree to dissemination of GK | LE3.199 | 3 |
| 12199 | 1 | Dissemination of genetic knowledge to the gene... | Agree to dissemination of GK | LE3.199 | 3 |
| 12200 | 5 | Dissemination of genetic knowledge to the gene... | Agree to dissemination of GK | LE3.199 | 3 |
ndf = pd.merge(maindf, select_df, on='id')
ndf.head(5)
| id | Description_x | Option_x | Description_y | Option_y | Variable | Group | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score | Dissemination of genetic knowledge to the gene... | Strongly disagree to dissemination of GK | LE3.199 | 3 |
| 1 | 0 | GK Score | Low GK Score | Policymaking – Contributing to working groups ... | Strongly disagree to Policymaking | LE3.200 | 4 |
| 2 | 0 | GK Score | Low GK Score | Revising and updating ethical guidelines conce... | Strongly disagree to Revising and Updating | LE3.201 | 5 |
| 3 | 0 | Gender | Female Participants | Dissemination of genetic knowledge to the gene... | Strongly disagree to dissemination of GK | LE3.199 | 3 |
| 4 | 0 | Gender | Female Participants | Policymaking – Contributing to working groups ... | Strongly disagree to Policymaking | LE3.200 | 4 |
ndf.Description_y = ndf.Description_y.str.replace('Dissemination of genetic knowledge to the general public', 'Dissemination of GK')
ndf.Description_y = ndf.Description_y.str.replace('Policymaking – Contributing to working groups concerning the regulation of genetic data', 'Policymaking')
ndf.Description_y = ndf.Description_y.str.replace('Revising and updating ethical guidelines concerning genetic research and use of genetic data', 'Revising and updating')
ndf.head(5)
| id | Description_x | Option_x | Description_y | Option_y | Variable | Group | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score | Dissemination of GK | Strongly disagree to dissemination of GK | LE3.199 | 3 |
| 1 | 0 | GK Score | Low GK Score | Policymaking | Strongly disagree to Policymaking | LE3.200 | 4 |
| 2 | 0 | GK Score | Low GK Score | Revising and updating | Strongly disagree to Revising and Updating | LE3.201 | 5 |
| 3 | 0 | Gender | Female Participants | Dissemination of GK | Strongly disagree to dissemination of GK | LE3.199 | 3 |
| 4 | 0 | Gender | Female Participants | Policymaking | Strongly disagree to Policymaking | LE3.200 | 4 |
ndf['id'] = 1
ndf.dropna(subset=['Option_y'], inplace=True)
ndf.dropna(subset=['Option_x'], inplace=True)
ndf = ndf[ndf['Option_y'] != ' ']
cb_xn_fin = ndf.groupby(['Option_x', 'Option_y','Description_y'])['id'].sum().reset_index()
cb_xn_fin
| Option_x | Option_y | Description_y | id | |
|---|---|---|---|---|
| 0 | Female Participants | Agree to Policymaking | Policymaking | 210 |
| 1 | Female Participants | Agree to Revising and Updating | Revising and updating | 234 |
| 2 | Female Participants | Agree to dissemination of GK | Dissemination of GK | 224 |
| 3 | Female Participants | Disagree to Policymaking | Policymaking | 24 |
| 4 | Female Participants | Disagree to Revising and Updating | Revising and updating | 25 |
| ... | ... | ... | ... | ... |
| 295 | Younger Participants | Strongly agree to Revising and Updating | Revising and updating | 154 |
| 296 | Younger Participants | Strongly agree to dissemination of GK | Dissemination of GK | 193 |
| 297 | Younger Participants | Strongly disagree to Policymaking | Policymaking | 35 |
| 298 | Younger Participants | Strongly disagree to Revising and Updating | Revising and updating | 32 |
| 299 | Younger Participants | Strongly disagree to dissemination of GK | Dissemination of GK | 47 |
300 rows × 4 columns
naindf = pd.DataFrame(maindf).reset_index()
naindf.id = 1
naindf.columns=['index', 'count','Description', 'Option_x']
naindf = naindf.groupby('Option_x')['count'].sum().reset_index()
new_cb_xn_fin = pd.merge(cb_xn_fin, naindf, on='Option_x')
new_cb_xn_fin
cb_xn = new_cb_xn_fin
cb_xf = cb_xn
cb_xf['prop'] = (cb_xf['id']/cb_xf['count']).round(3)
df = cb_xn
df = df.pivot(index='Option_x', columns='Option_y', values='id')
df = df.fillna(0)
####################
import plotly.express as px
fig = px.imshow(df, text_auto=True, aspect="auto")
fig.update_xaxes(
tickangle = 45,
title_text = "Option",
title_font = {"size": 14},
title_standoff = 25,
tickmode='linear')
fig.update_yaxes(
title_text = "Absolute Count of Selected Option",
title_standoff = 50,
tickmode='linear')
fig.update_layout(
title="Variable LE3.199-201: Likert items 1 – 5 strongly agree to disagree (N={})".format(len(select_df.id.unique())),
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
df = cb_xf
df = df.pivot(index='Option_x', columns='Option_y', values='prop')
df = df.fillna(0)
####################
import plotly.express as px
fig = px.imshow(df, text_auto=True, aspect="auto")
fig.update_xaxes(
tickangle = 45,
title_text = "Option",
title_font = {"size": 14},
title_standoff = 25,
tickmode='linear')
fig.update_yaxes(
title_text = "Proportion Count of Selected Option <br>For each likert item, proportion sum to 1",
title_standoff = 50,
tickmode='linear')
fig.update_layout(
title="Variable LE3.199-201: Likert items 1 – 5 strongly agree to disagree (N={})".format(len(select_df.id.unique())),
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
import dash_bio
fig = dash_bio.Clustergram(
data=df,
column_labels=list(df.columns.values),
row_labels=list(df.index),
height=720,
width=1080
)
for template in ["seaborn"]:
fig.update_layout(template=template)
fig.write_html("/home/manu10/Downloads/iglas_work/non_hard_select_all_gr_relations.html")
fig.show()
cb_xf
| Option_x | Option_y | Description_y | id | count | prop | |
|---|---|---|---|---|---|---|
| 0 | Female Participants | Agree to Policymaking | Policymaking | 210 | 497 | 0.423 |
| 1 | Female Participants | Agree to Revising and Updating | Revising and updating | 234 | 497 | 0.471 |
| 2 | Female Participants | Agree to dissemination of GK | Dissemination of GK | 224 | 497 | 0.451 |
| 3 | Female Participants | Disagree to Policymaking | Policymaking | 24 | 497 | 0.048 |
| 4 | Female Participants | Disagree to Revising and Updating | Revising and updating | 25 | 497 | 0.050 |
| ... | ... | ... | ... | ... | ... | ... |
| 295 | Younger Participants | Strongly agree to Revising and Updating | Revising and updating | 154 | 599 | 0.257 |
| 296 | Younger Participants | Strongly agree to dissemination of GK | Dissemination of GK | 193 | 599 | 0.322 |
| 297 | Younger Participants | Strongly disagree to Policymaking | Policymaking | 35 | 599 | 0.058 |
| 298 | Younger Participants | Strongly disagree to Revising and Updating | Revising and updating | 32 | 599 | 0.053 |
| 299 | Younger Participants | Strongly disagree to dissemination of GK | Dissemination of GK | 47 | 599 | 0.078 |
300 rows × 6 columns
cb_xf.sort_values("prop", inplace=True)
fig_high = px.bar(cb_xf, x="Option_x", y='prop', color="Option_y",
title="Total opinions across categories",
barmode='stack',
height=1080,
text='Option_x',
facet_col='Description_y'
)
fig_high.update_layout(
title="Variable LE3.199-201: Opinion on Likert Items (N={})".format(len(nndf.id.unique())),
xaxis_title="Options",
yaxis_title="Categories",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig_high.update_traces(showlegend=True)
fig_high.update_traces(marker_showscale=False)
fig_high.update_xaxes(
showgrid=True,
ticks="outside",
tickson="boundaries",
ticklen=1,
tickmode='linear'
)
for axis in fig_high.layout:
if type(fig_high.layout[axis]) == go.layout.XAxis:
fig_high.layout[axis].title.text = ''
fig_high.update_layout(
# keep the original annotations and add a list of new annotations:
annotations = list(fig_high.layout.annotations) +
[go.layout.Annotation(
x=-0.07,
y=0.5,
font=dict(
size=14
),
showarrow=False,
text="",
textangle=-90,
xref="paper",
yref="paper"
)
]
)
fig_high.update_layout(yaxis={'visible': True, 'showticklabels': True})
fig_high.update_layout(xaxis={'categoryorder':'total descending'})
fig = px.bar(cb_xf, x='Option_y', y='prop', height=1080, color='Description_y')
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(
title="Variable LE3.199-201: Opinion on Likert Items (N={})".format(len(nndf.id.unique())),
xaxis_title="Category",
yaxis_title="Sum of all proportions",
legend_title="Options",
font=dict(
family="Ariel, ariel",
size=12,
color="RebeccaPurple"
),
barmode="stack",
)
fig.update_yaxes(
title_text = "Sum of all proportions <br> (range[0:1], for each grouping)",
title_standoff = 50,
tickmode='linear')
fig.update_xaxes(
title_text = "Items",
title_standoff = 50,
tickmode='linear')
fig.show()
maindf
| id | Description | Option | |
|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score |
| 1 | 1 | GK Score | High GK Score |
| 2 | 3 | GK Score | High GK Score |
| 3 | 5 | GK Score | Low GK Score |
| 4 | 14 | GK Score | Low GK Score |
| ... | ... | ... | ... |
| 6952 | 1875 | Genetic Curiosity | Low Genetic Curiosity |
| 6953 | 1885 | Genetic Curiosity | Medium Genetic Curiosity |
| 6954 | 1886 | Genetic Curiosity | Medium Genetic Curiosity |
| 6955 | 1887 | Genetic Curiosity | Medium Genetic Curiosity |
| 6956 | 1888 | Genetic Curiosity | Low Genetic Curiosity |
6957 rows × 3 columns
xdf = new_large_df
## Filters no null values for options and groups 35 and 37
xdf["Group"] = xdf["Group"].map(str)
filter = xdf["Group"] != ' '
xdf = xdf[filter]
select = ['24']
xdf = xdf[xdf['Group'].isin(select)]
xdf['Option'] = xdf['Option']+' - '+xdf['value']
xdf = xdf.drop('Composite', axis=1)
xdf = xdf.drop('Progress', axis=1)
xdf = xdf.drop('UserLanguage', axis=1)
xdf = xdf.drop('Collection', axis=1)
xdf = xdf.drop('Variable', axis=1)
xdf = xdf.drop('Tag', axis=1)
xdf = xdf.drop('value', axis=1)
xdf = xdf.drop('level_0', axis=1)
xdf = xdf.drop('index', axis=1)
ndf = xdf
curiosity_df= pd.DataFrame(ndf).reset_index()
del curiosity_df['index']
curiosity_df.head(3)
| id | Description | Option | Group | |
|---|---|---|---|---|
| 0 | 0 | Would you be interested in finding out about g... | Future spouse or partner - Most Likely | 24 |
| 1 | 1 | Would you be interested in finding out about g... | Future spouse or partner - Under certain circu... | 24 |
| 2 | 3 | Would you be interested in finding out about g... | Future spouse or partner - Definitely | 24 |
en_df = nspecialdf[nspecialdf['id'].isin(list_ids)]
en_df.head(3)
| id | Description_x | Option_x | index | Description_y | Option_y | Variable | Group | Option | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score | 26686 | What concerns do participants have in relation... | Other | LE2.130 | 27 | Low GK Score Other |
| 1 | 0 | Gender | Female Participants | 26686 | What concerns do participants have in relation... | Other | LE2.130 | 27 | Female Participants Other |
| 2 | 0 | Age | Older Participants | 26686 | What concerns do participants have in relation... | Other | LE2.130 | 27 | Older Participants Other |
nen_df = en_df[['id', 'Description_x', 'Option', 'Variable', 'Group']].copy().reset_index()
del nen_df['index']
nen_df.columns = ['id', 'Description', 'Option', 'Variable', 'Group']
nen_df.head(3)
| id | Description | Option | Variable | Group | |
|---|---|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score Other | LE2.130 | 27 |
| 1 | 0 | Gender | Female Participants Other | LE2.130 | 27 |
| 2 | 0 | Age | Older Participants Other | LE2.130 | 27 |
list_ids = list(maindf.id.unique())
ndf_27['id'] =ndf_27['id'].apply(int)
con_27 = ndf_27[ndf_27['id'].isin(list_ids)]
del con_27['index']
con_27
| id | Description | Option | Variable | Group | |
|---|---|---|---|---|---|
| 8846 | 1 | What concerns do participants have in relation... | Do not know who will have access to that infor... | LE2.122 | 27 |
| 8847 | 3 | What concerns do participants have in relation... | Do not know who will have access to that infor... | LE2.122 | 27 |
| 8852 | 14 | What concerns do participants have in relation... | Do not know who will have access to that infor... | LE2.122 | 27 |
| 8882 | 131 | What concerns do participants have in relation... | Do not know who will have access to that infor... | LE2.122 | 27 |
| 8884 | 134 | What concerns do participants have in relation... | Do not know who will have access to that infor... | LE2.122 | 27 |
| ... | ... | ... | ... | ... | ... |
| 12732 | 1271 | What concerns do participants have in relation... | Other | LE2.130 | 27 |
| 12734 | 1346 | What concerns do participants have in relation... | Other | LE2.130 | 27 |
| 12736 | 1499 | What concerns do participants have in relation... | Other | LE2.130 | 27 |
| 12738 | 1602 | What concerns do participants have in relation... | Other | LE2.130 | 27 |
| 12739 | 1645 | What concerns do participants have in relation... | Other | LE2.130 | 27 |
2088 rows × 5 columns
select = ['3', '4', '5']
select_df = megadf[megadf['Group'].isin(select)]
select_df.head(3)
ndf = select_df
ndf.head(5)
ndf.Description = ndf.Description.str.replace('Dissemination of genetic knowledge to the general public', 'Dissemination of GK')
ndf.Description = ndf.Description.str.replace('Policymaking – Contributing to working groups concerning the regulation of genetic data', 'Policymaking')
ndf.Description = ndf.Description.str.replace('Revising and updating ethical guidelines concerning genetic research and use of genetic data', 'Revising and updating')
likert_df = pd.DataFrame(ndf).reset_index()
del likert_df['index']
likert_df.head(3)
| id | Description | Option | Variable | Group | |
|---|---|---|---|---|---|
| 0 | 0 | Dissemination of GK | Strongly disagree to dissemination of GK | LE3.199 | 3 |
| 1 | 1 | Dissemination of GK | Agree to dissemination of GK | LE3.199 | 3 |
| 2 | 5 | Dissemination of GK | Agree to dissemination of GK | LE3.199 | 3 |
maindf['Group'] = '77'
maindf['Variable'] = 'Class_X'
maindf
| id | Description | Option | Group | Variable | |
|---|---|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score | 77 | Class_X |
| 1 | 1 | GK Score | High GK Score | 77 | Class_X |
| 2 | 3 | GK Score | High GK Score | 77 | Class_X |
| 3 | 5 | GK Score | Low GK Score | 77 | Class_X |
| 4 | 14 | GK Score | Low GK Score | 77 | Class_X |
| ... | ... | ... | ... | ... | ... |
| 6952 | 1875 | Genetic Curiosity | Low Genetic Curiosity | 77 | Class_X |
| 6953 | 1885 | Genetic Curiosity | Medium Genetic Curiosity | 77 | Class_X |
| 6954 | 1886 | Genetic Curiosity | Medium Genetic Curiosity | 77 | Class_X |
| 6955 | 1887 | Genetic Curiosity | Medium Genetic Curiosity | 77 | Class_X |
| 6956 | 1888 | Genetic Curiosity | Low Genetic Curiosity | 77 | Class_X |
6957 rows × 5 columns
nen_df
| id | Description | Option | Variable | Group | |
|---|---|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score Other | LE2.130 | 27 |
| 1 | 0 | Gender | Female Participants Other | LE2.130 | 27 |
| 2 | 0 | Age | Older Participants Other | LE2.130 | 27 |
| 3 | 0 | Confidence in GK | Low GK Confidence Other | LE2.130 | 27 |
| 4 | 0 | Related/ Not related to law | Participants not related to law Other | LE2.130 | 27 |
| ... | ... | ... | ... | ... | ... |
| 18787 | 1888 | Concern | Medium Concern I would rather not know of any ... | LE2.124 | 27 |
| 18788 | 1888 | Concern | Medium Concern I am concerned my data will be ... | LE2.129 | 27 |
| 18789 | 1888 | Genetic Curiosity | Low Genetic Curiosity Do not know who will hav... | LE2.122 | 27 |
| 18790 | 1888 | Genetic Curiosity | Low Genetic Curiosity I would rather not know ... | LE2.124 | 27 |
| 18791 | 1888 | Genetic Curiosity | Low Genetic Curiosity I am concerned my data w... | LE2.129 | 27 |
18792 rows × 5 columns
bndf = pd.concat([maindf, curiosity_df, con_27, likert_df]).reset_index()
del bndf['index']
nbndf = bndf
nbndf.head(3)
| id | Description | Option | Group | Variable | |
|---|---|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score | 77 | Class_X |
| 1 | 1 | GK Score | High GK Score | 77 | Class_X |
| 2 | 3 | GK Score | High GK Score | 77 | Class_X |
nbndf.Description.unique()
array(['GK Score', 'Gender', 'Age', 'Confidence in GK',
'Related/ Not related to law', 'Students/ Non Students',
'Law or Non Law Students and Non Students', 'Concern',
'Genetic Curiosity',
'Would you be interested in finding out about genetic information',
'What concerns do participants have in relation to genetic testing',
'Dissemination of GK', 'Policymaking', 'Revising and updating'],
dtype=object)
nbndf['Group'] = nbndf['Group'].map(str)
nbndf.Group.unique()
array(['77', '24', '27', '3', '4', '5'], dtype=object)
options = nbndf.Group.unique()
ranges = list(range(0, len(options)))
# get categorical codes
categories = dict(zip(options,ranges))
categories
{'77': 0, '24': 1, '27': 2, '3': 3, '4': 4, '5': 5}
BNdf = nbndf
BNdf
| id | Description | Option | Group | Variable | |
|---|---|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score | 77 | Class_X |
| 1 | 1 | GK Score | High GK Score | 77 | Class_X |
| 2 | 3 | GK Score | High GK Score | 77 | Class_X |
| 3 | 5 | GK Score | Low GK Score | 77 | Class_X |
| 4 | 14 | GK Score | Low GK Score | 77 | Class_X |
| ... | ... | ... | ... | ... | ... |
| 19046 | 1875 | Revising and updating | Strongly agree to Revising and Updating | 5 | LE3.201 |
| 19047 | 1885 | Revising and updating | Agree to Revising and Updating | 5 | LE3.201 |
| 19048 | 1886 | Revising and updating | Strongly agree to Revising and Updating | 5 | LE3.201 |
| 19049 | 1887 | Revising and updating | Strongly agree to Revising and Updating | 5 | LE3.201 |
| 19050 | 1888 | Revising and updating | Strongly agree to Revising and Updating | 5 | LE3.201 |
19051 rows × 5 columns
BNdf.Group.unique()
array(['77', '24', '27', '3', '4', '5'], dtype=object)
BNdf.Description.unique()
array(['GK Score', 'Gender', 'Age', 'Confidence in GK',
'Related/ Not related to law', 'Students/ Non Students',
'Law or Non Law Students and Non Students', 'Concern',
'Genetic Curiosity',
'Would you be interested in finding out about genetic information',
'What concerns do participants have in relation to genetic testing',
'Dissemination of GK', 'Policymaking', 'Revising and updating'],
dtype=object)
BNdf['Group'] = BNdf['Group'].map(str)
select= ['77', '24', '27', '8', '3', '4', '5']
nndf = BNdf[BNdf['Group'].isin(select)]
#cps['Option'] = cps['Option']+' '+cps['Description']
sources = nndf[['id', 'Option']].copy()
len_options = len(nndf.Option.unique())
len_options
len_ids = len(nndf.id.unique()) +1
len_ids
ranges = list(range(len_ids, len_ids+len_options))
len(ranges) == len(nndf.Option.unique())
options = nndf.Option.unique()
options
# get categorical codes
categories = dict(zip(options,ranges))
categories
sources['codes'] = sources['Option'].map(categories)
xtt=pd.DataFrame()
xtt = sources[['Option', 'codes']].copy()
# get source codes and counts
sources['codes'] = sources['codes'].map(str)
counts = sources.groupby(["id"])["codes"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+counts['codes'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
counts['xcodes'] = nx.iloc[:,2]
gcounts = sources.groupby(["id"])["Option"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+gcounts['Option'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
gcounts['xoption'] = nx.iloc[:,2]
gcounts
lel = pd.merge(counts, gcounts, on='id')
del lel['codes']
del lel['Option']
lel
# writing operations
wo = []
for i in range(len(counts['xcodes'])) :
wo.append(pd.Series(counts.iloc[i, 2]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
# value counts df
vc = pd.DataFrame(wo)
# counts
cxounts = pd.concat([lel, vc], axis=1)
lex = cxounts.set_index(['id','xcodes', 'xoption']).stack().reset_index()
lex['counts'] = lex[0]
lex['codes'] = lex['level_3']
del lex[0]
del lex['level_3']
# paths
lex['path'] = """'""" + lex["id"].astype(str)+"',"+lex["xcodes"]
lex['label'] = """'""" + lex["id"].astype(str)+"',"+lex["xoption"]
lex['path'] = lex['path'].str.replace("""'""", '')
lex['label'] = lex['label'].str.replace("""'""", '')
lex.head(2)
lex["counts"] = lex["counts"].map(int)
## paths and sources
path_list = list(lex.path.unique())
label_list = list(lex.xoption.unique())
############################################## corrected code
def zigzag(seq):
"""Return two sequences with alternating elements from `seq`"""
seq_int = [list(map(int, x)) for x in seq]
x = []
y = []
for i in seq_int:
for j, k in zip(i, i[1:]):
x.append(j)
y.append(k)
return list(zip(x, y))
# get a path graph
y = []
for i in range(len(path_list)):
y.append(list(path_list[i].split(',')))
big_list = zigzag(y)
#### MOST COMMON PATH
c_path = pd.DataFrame(big_list)
c_path = c_path[c_path[0].isin(ranges)] #remove the participant id initials
c_path[2] = c_path[0]
c_path[0] = '1'
c_path
########################## edit here
tagged = c_path.groupby([1, 2])[0].agg(lambda x: """','""".join(x[x != ''])).reset_index()
xtagged= ("""'"""+tagged[0].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
xtagged['counts'] = [len(x.split(',')) for x in xtagged[0].tolist()]
ztagged = pd.concat([tagged, xtagged], axis=1)
ztagged
####
inv_map = {str(v): str(k) for k, v in categories.items()}
fif = ztagged[[1, 2, 0, 'counts']]
fif[1] = fif[1].map(str)
fif[3] = fif[1].map(inv_map)
fif[2] = fif[2].map(str)
fif[4] = fif[2].map(inv_map)
del fif[0]
fif['label'] = fif[3] + ' ' + fif[4]
fif[1] = fif[1].map(int)
fif[2] = fif[2].map(int)
fif
| 1 | 2 | counts | 3 | 4 | label | |
|---|---|---|---|---|---|---|
| 0 | 1351 | 1353 | 1 | Low GK Score | Female Participants | Low GK Score Female Participants |
| 1 | 1351 | 1355 | 1 | Low GK Score | Older Participants | Low GK Score Older Participants |
| 2 | 1351 | 1360 | 1 | Low GK Score | Participants related to law | Low GK Score Participants related to law |
| 3 | 1351 | 1366 | 1 | Low GK Score | Medium Concern | Low GK Score Medium Concern |
| 4 | 1351 | 1371 | 1 | Low GK Score | Future spouse or partner - Most Likely | Low GK Score Future spouse or partner - Most L... |
| ... | ... | ... | ... | ... | ... | ... |
| 454 | 1422 | 1411 | 1 | Neutral towards to Revising and Updating | Neutral towards to dissemination of GK | Neutral towards to Revising and Updating Neutr... |
| 455 | 1422 | 1414 | 12 | Neutral towards to Revising and Updating | Strongly agree to Policymaking | Neutral towards to Revising and Updating Stron... |
| 456 | 1422 | 1415 | 30 | Neutral towards to Revising and Updating | Agree to Policymaking | Neutral towards to Revising and Updating Agree... |
| 457 | 1422 | 1416 | 32 | Neutral towards to Revising and Updating | Neutral towards to Policymaking | Neutral towards to Revising and Updating Neutr... |
| 458 | 1422 | 1417 | 7 | Neutral towards to Revising and Updating | Disagree to Policymaking | Neutral towards to Revising and Updating Disag... |
459 rows × 6 columns
fif['connections'] = fif.iloc[:,0].astype(str)+" "+fif.iloc[:,1].astype(str)
cls = pd.DataFrame()
cls['connections'] = pd.DataFrame(fif['connections'].unique())
import random
# generate random colours
amount = len(fif['connections'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
cls['colour'] = colour
fif = pd.merge(fif, cls, on='connections')
fif
| 1 | 2 | counts | 3 | 4 | label | connections | colour | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1351 | 1353 | 1 | Low GK Score | Female Participants | Low GK Score Female Participants | 1351 1353 | #3c4900 |
| 1 | 1351 | 1355 | 1 | Low GK Score | Older Participants | Low GK Score Older Participants | 1351 1355 | #a450b0 |
| 2 | 1351 | 1360 | 1 | Low GK Score | Participants related to law | Low GK Score Participants related to law | 1351 1360 | #ee101e |
| 3 | 1351 | 1366 | 1 | Low GK Score | Medium Concern | Low GK Score Medium Concern | 1351 1366 | #5ac30f |
| 4 | 1351 | 1371 | 1 | Low GK Score | Future spouse or partner - Most Likely | Low GK Score Future spouse or partner - Most L... | 1351 1371 | #b5a5ac |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 454 | 1422 | 1411 | 1 | Neutral towards to Revising and Updating | Neutral towards to dissemination of GK | Neutral towards to Revising and Updating Neutr... | 1422 1411 | #67d29c |
| 455 | 1422 | 1414 | 12 | Neutral towards to Revising and Updating | Strongly agree to Policymaking | Neutral towards to Revising and Updating Stron... | 1422 1414 | #7809c5 |
| 456 | 1422 | 1415 | 30 | Neutral towards to Revising and Updating | Agree to Policymaking | Neutral towards to Revising and Updating Agree... | 1422 1415 | #eef785 |
| 457 | 1422 | 1416 | 32 | Neutral towards to Revising and Updating | Neutral towards to Policymaking | Neutral towards to Revising and Updating Neutr... | 1422 1416 | #ed65f5 |
| 458 | 1422 | 1417 | 7 | Neutral towards to Revising and Updating | Disagree to Policymaking | Neutral towards to Revising and Updating Disag... | 1422 1417 | #10cd85 |
459 rows × 8 columns
def nodify(node_names):
node_names = unique_list
# uniqe name begginings
ends = sorted(list(set([e[0] for e in node_names])))
# intervals
steps = 1/4
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
# x and y values in list form
x_values = [nodes_x[n[0]] for n in node_names]
y_values = [x*0.03 for x in range(1, len(x_values))]
return x_values, y_values
### filter here for single counts
fif['counts'] = fif['counts'].map(int)
nfif = fif[fif['counts'] > 1]
### new plot
sources = list(nfif[1])
targets = list(nfif[2])
values = list(nfif['counts'])
labels = list(nfif['label'])
colours = list(nfif['colour'])
unique_list = nfif['label'].unique()
sources = sources
targets = targets
values = values
nodified = nodify(node_names=unique_list)
nodified
###
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 20,
thickness = 5,
line = dict(color = 'red', width = 1),
label = labels,
customdata = labels,
hovertemplate='Source has total value %{value}<extra></extra>',
color = 'blue',
),
link = dict(
source = sources, # indices correspond to labels, eg A1, A2, A2, B1, ...
target = targets,
value = values,
customdata = labels,
color = colours,
hovertemplate='Absolute count: %{value}'+
'<br />Option: %{customdata}<extra></extra>'
))])
go.Layout(title='Sankey plot',
#other options for the plot
hoverlabel=dict(font=dict(family='sans-serif', size=100)))
fig = fig.update_layout(margin=dict(t=100))
fig.write_html("/home/manu10/Downloads/iglas_work/BIG_sankey.html")
fig.show()
####### GET SOME SIGNIFICANT PATHS, options occuring together
nfif = fif[fif['counts'] > 0]
nfif
pax = pd.DataFrame(nndf).reset_index()
pax.id = 1
pax.drop('index', axis=1, inplace=True)
pax = pax.groupby('Option')['id'].sum().reset_index()
pax.columns = [3, 'id']
nxn = pd.merge(nfif, pax, on=3)
pax.columns = [4, 'idx']
rnxn = pd.merge(nxn, pax, on=4)
rnxn['p1'] = rnxn['counts']/rnxn['id']
rnxn['p2'] = rnxn['counts']/rnxn['idx']
rnxn['p1p2'] = rnxn['p1']*rnxn['p2']
#rnxn = rnxn[rnxn['p1p2'] >= .05]
rnxn.sort_values(['p1p2'], ascending=False, inplace=True)
rnxn
#render dataframe as html
html = rnxn.to_html()
#write html to file
text_file = open("PATHS_RNXN_ALL_GR.html", "w")
text_file.write(html)
text_file.close()
rnxn
| 1 | 2 | counts | 3 | 4 | label | connections | colour | id | idx | p1 | p2 | p1p2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 258 | 1378 | 1374 | 139 | Spouse or partner - Never | Future spouse or partner - Never | Spouse or partner - Never Future spouse or par... | 1378 1374 | #976435 | 158 | 193 | 0.879747 | 0.720207 | 0.633600 |
| 441 | 1420 | 1414 | 200 | Strongly agree to Revising and Updating | Strongly agree to Policymaking | Strongly agree to Revising and Updating Strong... | 1420 1414 | #ae1fbc | 231 | 278 | 0.865801 | 0.719424 | 0.622878 |
| 435 | 1418 | 1413 | 33 | Strongly disagree to Revising and Updating | Strongly disagree to Policymaking | Strongly disagree to Revising and Updating Str... | 1418 1413 | #32f50f | 41 | 45 | 0.804878 | 0.733333 | 0.590244 |
| 445 | 1419 | 1415 | 254 | Agree to Revising and Updating | Agree to Policymaking | Agree to Revising and Updating Agree to Policy... | 1419 1415 | #c1bb11 | 353 | 318 | 0.719547 | 0.798742 | 0.574732 |
| 345 | 1388 | 1384 | 282 | Other relatives - Never | Siblings - Never | Other relatives - Never Siblings - Never | 1388 1384 | #9759f1 | 416 | 334 | 0.677885 | 0.844311 | 0.572346 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 0 | 1351 | 1353 | 1 | Low GK Score | Female Participants | Low GK Score Female Participants | 1351 1353 | #3c4900 | 496 | 497 | 0.002016 | 0.002012 | 0.000004 |
| 66 | 1405 | 1389 | 1 | I am worried some information about my physica... | Other relatives - Under certain circumstances | I am worried some information about my physica... | 1405 1389 | #eb72dd | 448 | 557 | 0.002232 | 0.001795 | 0.000004 |
| 60 | 1351 | 1389 | 1 | Low GK Score | Other relatives - Under certain circumstances | Low GK Score Other relatives - Under certain c... | 1351 1389 | #a85807 | 496 | 557 | 0.002016 | 0.001795 | 0.000004 |
| 67 | 1406 | 1389 | 1 | I am concerned my data will be used for other ... | Other relatives - Under certain circumstances | I am concerned my data will be used for other ... | 1406 1389 | #ffa840 | 498 | 557 | 0.002008 | 0.001795 | 0.000004 |
| 315 | 1392 | 1379 | 1 | Friends - Never | Children - Definitely | Friends - Never Children - Definitely | 1392 1379 | #4a80ba | 705 | 430 | 0.001418 | 0.002326 | 0.000003 |
459 rows × 13 columns
### WEIGHTED COUNT SORTING
nrnxn = rnxn.copy()
nrnxn['counts_w'] = nrnxn['counts']*nrnxn['p1p2']
#rnxn = rnxn[rnxn['p1p2'] >= .05]
nrnxn.sort_values(['counts_w'], ascending=False, inplace=True)
nrnxn
#render dataframe as html
html = nrnxn.to_html()
#write html to file
text_file = open("PATHS_RNXN_WEIGHT_SORTED_ALL_GR.html", "w")
text_file.write(html)
text_file.close()
# Node statistics
nrnxn['prevalance'] = nrnxn['counts']/(nrnxn['id']+nrnxn['idx'])
g1 = nrnxn.groupby([3])['prevalance'].sum().sort_values(ascending=False).reset_index()
g2 = nrnxn.groupby([3])['p1p2'].sum().sort_values(ascending=False).reset_index()
g3 = nrnxn.groupby([3])['counts_w'].sum().sort_values(ascending=False).reset_index()
g4 = nrnxn.groupby([3])['counts'].sum().sort_values(ascending=False).reset_index()
data_frames = [g1, g2, g3, g4]
combined_all_gr = reduce(lambda left,right: pd.merge(left,right,on=[3],
how='outer'), data_frames)
combined_all_gr.columns = ['node', 'prevalance', 'strength', 'weighted_counts', 'total_counts']
combined_all_gr
| node | prevalance | strength | weighted_counts | total_counts | |
|---|---|---|---|---|---|
| 0 | I am worried some information about my physica... | 0.654218 | 0.417039 | 50.466692 | 448 |
| 1 | Friends - Never | 0.611865 | 0.683136 | 229.600434 | 705 |
| 2 | Students | 0.598142 | 0.779883 | 216.738305 | 561 |
| 3 | Younger Participants | 0.597231 | 0.775256 | 247.273474 | 599 |
| 4 | Other relatives - Under certain circumstances | 0.579877 | 0.596097 | 191.313179 | 557 |
| ... | ... | ... | ... | ... | ... |
| 67 | Other | 0.106457 | 0.015975 | 0.092235 | 32 |
| 68 | Other - Definitely | 0.096147 | 0.026690 | 0.092426 | 15 |
| 69 | Other - Most Likely | 0.091306 | 0.041325 | 0.357458 | 20 |
| 70 | Low GK Score | 0.020857 | 0.000174 | 0.000174 | 16 |
| 71 | High GK Score | 0.016215 | 0.000547 | 0.000547 | 6 |
72 rows × 5 columns
#render dataframe as html
html = combined_all_gr.to_html()
#write html to file
text_file = open("PATHS_RNXN_NODE_STATS_ALL_GR.html", "w")
text_file.write(html)
text_file.close()
from pyvis.network import Network
import pandas as pd
got_net = Network(height='1080px', width='100%', bgcolor='#ffffff', font_color='black', directed=False)
# set the physics layout of the network
# got_net.barnes_hut()
got_data = nrnxn
got_data = got_data[got_data['p1p2'] >= 0.1]
sources = got_data[3]
targets = got_data[4]
weights_edges = got_data['p1p2'].round(3)
weights_n1 = got_data['p1'].round(3)
weights_n2 = got_data['p2'].round(3)
colours = got_data['colour']
edge_data = zip(sources, targets, weights_edges, weights_n1, weights_n2, colours)
for e in edge_data:
src = e[0]
dst = e[1]
we = e[2]
wn1 = e[3]
wn2 = e[4]
c = e[5]
got_net.add_node(src, src, title=src, value=wn1, color=c)
got_net.add_node(dst, dst, title=dst, value=wn2, color=c)
got_net.add_edge(src, dst, value=we, color=c)
neighbor_map = got_net.get_adj_list()
edges = got_net.get_edges()
nodes=got_net.get_nodes()
N_nodes=len(nodes)
N_edges=len(edges)
weights=[[] for i in range(N_nodes)]
#Associating weights to neighbors
for i in range(N_nodes): #Loop through nodes
for neighbor in neighbor_map[nodes[i]]: #and neighbors
for j in range(N_edges): #associate weights to the edge between node and neighbor
if (edges[j]['from']==nodes[i] and edges[j]['to']==neighbor) or \
(edges[j]['from']==neighbor and edges[j]['to']==nodes[i]):
weights[i].append(edges[j]['value'])
for node,i in zip(got_net.nodes,range(N_nodes)):
node['value']=len(neighbor_map[node['id']])
node['weight']=[str(weights[i][k]) for k in range(len(weights[i]))]
list_neighbor=list(neighbor_map[node['id']])
#Concatenating neighbors and weights
hover_str=[list_neighbor[k]+' '+ node['weight'][k] for k in range(node['value'])]
#Setting up node title for hovering
node['title']+=' Neighbors:<br>'+'<br>'.join(hover_str)
got_net.show_buttons(filter_=['physics'])
got_net.show('allnet_GR.html')
###### Top paths
#### Top paths
paths = pd.DataFrame(y)
#paths = paths.drop(0, axis=1)
paths[0] = 1
paths.fillna(value='', inplace = True)
paths['path'] = paths[paths.columns[2:]].apply(
lambda x: ','.join(x.dropna().astype(str)),
axis=1
)
inv_map = {str(v): str(k) for k, v in categories.items()}
paths['name'] = paths[paths.columns[2:]].apply(
lambda x: ','.join(x.map(inv_map).dropna().astype(str)),
axis=1
)
paths['source'] = paths[1].map(inv_map)
npaths = paths.groupby(['source', 'path', 'name'])[0].sum().reset_index()
npaths = npaths[npaths[0] >5]
npaths['count'] = npaths[0]
npaths = npaths.sort_values(by='count', ascending=False)
npaths.head(n=20)
| source | path | name | 0 | count | |
|---|---|---|---|---|---|
| 209 | Future spouse or partner - Never | 1378,1382,1384,1388,1392,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Never,Children - Never,Sib... | 29 | 29 |
| 261 | Future spouse or partner - Under certain circu... | 1376,1380,1385,1389,1392,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Under certain circumstance... | 20 | 20 |
| 196 | Future spouse or partner - Never | 1378,1380,1384,1388,1392,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Never,Children - Under cer... | 18 | 18 |
| 259 | Future spouse or partner - Under certain circu... | 1376,1380,1385,1389,1391,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Under certain circumstance... | 15 | 15 |
| 251 | Future spouse or partner - Under certain circu... | 1376,1380,1384,1388,1392,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Under certain circumstance... | 14 | 14 |
| 15 | Future spouse or partner - Definitely | 1377,1379,1383,1387,1394,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Definitely,Children - Defi... | 13 | 13 |
| 275 | Future spouse or partner - Under certain circu... | 1376,1381,1385,1389,1392,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Under certain circumstance... | 10 | 10 |
| 105 | Future spouse or partner - Most Likely | 1375,1381,1385,1389,1391,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Most Likely,Children - Mos... | 9 | 9 |
| 210 | Future spouse or partner - Never | 1378,1382,1384,1388,1392,1395,,,,,,,,,,,,,,,,,,, | Spouse or partner - Never,Children - Never,Sib... | 9 | 9 |
| 203 | Future spouse or partner - Never | 1378,1380,1385,1389,1392,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Never,Children - Under cer... | 8 | 8 |
| 107 | Future spouse or partner - Most Likely | 1375,1381,1385,1389,1392,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Most Likely,Children - Mos... | 6 | 6 |
| 113 | Future spouse or partner - Most Likely | 1375,1381,1386,1389,1391,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Most Likely,Children - Mos... | 6 | 6 |
| 118 | Future spouse or partner - Most Likely | 1375,1381,1386,1390,1391,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Most Likely,Children - Mos... | 6 | 6 |
| 274 | Future spouse or partner - Under certain circu... | 1376,1381,1385,1389,1391,,,,,,,,,,,,,,,,,,,, | Spouse or partner - Under certain circumstance... | 6 | 6 |
#render dataframe as html
html = npaths.to_html()
#write html to file
text_file = open("PATHS_XOR_ALL_GR.html", "w")
text_file.write(html)
text_file.close()
nndf
| id | Description | Option | Group | Variable | |
|---|---|---|---|---|---|
| 0 | 0 | GK Score | Low GK Score | 77 | Class_X |
| 1 | 1 | GK Score | High GK Score | 77 | Class_X |
| 2 | 3 | GK Score | High GK Score | 77 | Class_X |
| 3 | 5 | GK Score | Low GK Score | 77 | Class_X |
| 4 | 14 | GK Score | Low GK Score | 77 | Class_X |
| ... | ... | ... | ... | ... | ... |
| 19046 | 1875 | Revising and updating | Strongly agree to Revising and Updating | 5 | LE3.201 |
| 19047 | 1885 | Revising and updating | Agree to Revising and Updating | 5 | LE3.201 |
| 19048 | 1886 | Revising and updating | Strongly agree to Revising and Updating | 5 | LE3.201 |
| 19049 | 1887 | Revising and updating | Strongly agree to Revising and Updating | 5 | LE3.201 |
| 19050 | 1888 | Revising and updating | Strongly agree to Revising and Updating | 5 | LE3.201 |
19051 rows × 5 columns
select= ['77', '24', '27', '8', '3', '4', '5']
nndf = BNdf[BNdf['Group'].isin(select)]
select = [
'0 Low confidence Confidence profile',
'0 High confident Confidence profile', '0 Non law Legal',
'0 Law Legal', '0 Student student', '0 Not student student',
'0 Other branch branch', '0 Not a student branch',
'0 Law branch branch', '0 Low concern', '0 Medium concern',
'0 High concern', '0 High curiosity', '0 Low curiosity',
'0 Medium curiosity'] # only keep age, and gk
nndf['Option'] = nndf['Option'].map(str)
nndf = nndf[~nndf['Option'].isin(select)]
#cps['Option'] = cps['Option']+' '+cps['Description']
sources = nndf[['id', 'Option']].copy()
len_options = len(nndf.Option.unique())
len_options
len_ids = len(nndf.id.unique()) +1
len_ids
ranges = list(range(len_ids, len_ids+len_options))
len(ranges) == len(nndf.Option.unique())
options = nndf.Option.unique()
options
# get categorical codes
categories = dict(zip(options,ranges))
categories
sources['codes'] = sources['Option'].map(categories)
xtt=pd.DataFrame()
xtt = sources[['Option', 'codes']].copy()
# get source codes and counts
sources['codes'] = sources['codes'].map(str)
counts = sources.groupby(["id"])["codes"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+counts['codes'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
counts['xcodes'] = nx.iloc[:,2]
gcounts = sources.groupby(["id"])["Option"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+gcounts['Option'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
gcounts['xoption'] = nx.iloc[:,2]
gcounts
lel = pd.merge(counts, gcounts, on='id')
del lel['codes']
del lel['Option']
lel
# writing operations
wo = []
for i in range(len(counts['xcodes'])) :
wo.append(pd.Series(counts.iloc[i, 2]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
# value counts df
vc = pd.DataFrame(wo)
# counts
cxounts = pd.concat([lel, vc], axis=1)
lex = cxounts.set_index(['id','xcodes', 'xoption']).stack().reset_index()
lex['counts'] = lex[0]
lex['codes'] = lex['level_3']
del lex[0]
del lex['level_3']
# paths
lex['path'] = """'""" + lex["id"].astype(str)+"',"+lex["xcodes"]
lex['label'] = """'""" + lex["id"].astype(str)+"',"+lex["xoption"]
lex['path'] = lex['path'].str.replace("""'""", '')
lex['label'] = lex['label'].str.replace("""'""", '')
lex.head(2)
lex["counts"] = lex["counts"].map(int)
## paths and sources
path_list = list(lex.path.unique())
label_list = list(lex.xoption.unique())
############################################## corrected code
def zigzag(seq):
"""Return two sequences with alternating elements from `seq`"""
seq_int = [list(map(int, x)) for x in seq]
x = []
y = []
for i in seq_int:
for j, k in zip(i, i[1:]):
x.append(j)
y.append(k)
return list(zip(x, y))
# get a path graph
y = []
for i in range(len(path_list)):
y.append(list(path_list[i].split(',')))
big_list = zigzag(y)
#### MOST COMMON PATH
c_path = pd.DataFrame(big_list)
c_path = c_path[c_path[0].isin(ranges)] #remove the participant id initials
c_path[2] = c_path[0]
c_path[0] = '1'
c_path
########################## edit here
tagged = c_path.groupby([1, 2])[0].agg(lambda x: """','""".join(x[x != ''])).reset_index()
xtagged= ("""'"""+tagged[0].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
xtagged['counts'] = [len(x.split(',')) for x in xtagged[0].tolist()]
ztagged = pd.concat([tagged, xtagged], axis=1)
ztagged
####
inv_map = {str(v): str(k) for k, v in categories.items()}
fif = ztagged[[1, 2, 0, 'counts']]
fif[1] = fif[1].map(str)
fif[3] = fif[1].map(inv_map)
fif[2] = fif[2].map(str)
fif[4] = fif[2].map(inv_map)
del fif[0]
fif['label'] = fif[3] + ' ' + fif[4]
fif[1] = fif[1].map(int)
fif[2] = fif[2].map(int)
fif
| 1 | 2 | counts | 3 | 4 | label | |
|---|---|---|---|---|---|---|
| 0 | 1351 | 1353 | 1 | Low GK Score | Female Participants | Low GK Score Female Participants |
| 1 | 1351 | 1355 | 1 | Low GK Score | Older Participants | Low GK Score Older Participants |
| 2 | 1351 | 1360 | 1 | Low GK Score | Participants related to law | Low GK Score Participants related to law |
| 3 | 1351 | 1366 | 1 | Low GK Score | Medium Concern | Low GK Score Medium Concern |
| 4 | 1351 | 1371 | 1 | Low GK Score | Future spouse or partner - Most Likely | Low GK Score Future spouse or partner - Most L... |
| ... | ... | ... | ... | ... | ... | ... |
| 454 | 1422 | 1411 | 1 | Neutral towards to Revising and Updating | Neutral towards to dissemination of GK | Neutral towards to Revising and Updating Neutr... |
| 455 | 1422 | 1414 | 12 | Neutral towards to Revising and Updating | Strongly agree to Policymaking | Neutral towards to Revising and Updating Stron... |
| 456 | 1422 | 1415 | 30 | Neutral towards to Revising and Updating | Agree to Policymaking | Neutral towards to Revising and Updating Agree... |
| 457 | 1422 | 1416 | 32 | Neutral towards to Revising and Updating | Neutral towards to Policymaking | Neutral towards to Revising and Updating Neutr... |
| 458 | 1422 | 1417 | 7 | Neutral towards to Revising and Updating | Disagree to Policymaking | Neutral towards to Revising and Updating Disag... |
459 rows × 6 columns
fif['connections'] = fif.iloc[:,0].astype(str)+" "+fif.iloc[:,1].astype(str)
cls = pd.DataFrame()
cls['connections'] = pd.DataFrame(fif['connections'].unique())
import random
# generate random colours
amount = len(fif['connections'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
print(colour)
['#3180ae', '#ae3d8b', '#7f2566', '#fbb776', '#4759a5', '#62008f', '#cd4a10', '#257018', '#f7f1a8', '#1f41e1', '#ff4c12', '#245e1f', '#cb4ae6', '#b569f6', '#7c3b0d', '#bd537f', '#7bd395', '#c10138', '#d12037', '#25a9d6', '#77356d', '#ebd5b8', '#426489', '#23eeb5', '#d3470e', '#7918f4', '#d9b538', '#0eb55a', '#ceb2f9', '#213574', '#3145f6', '#172ae0', '#6533fc', '#426b98', '#bff935', '#6d0be9', '#7d0499', '#02550f', '#45e1de', '#20aeba', '#b2784d', '#303407', '#b2eb0d', '#fe3188', '#a19fc5', '#afd2b1', '#208398', '#297964', '#f72195', '#c3b2db', '#5a4efe', '#0bbd62', '#3ba754', '#3da61b', '#1df366', '#a63727', '#7056c8', '#0c6aa5', '#b0483e', '#addf3c', '#2fb593', '#6e2ea6', '#30db8f', '#a3d371', '#678806', '#e5c213', '#fa5e0d', '#cf740e', '#dabd79', '#1cc827', '#635d34', '#3b8102', '#2d3c28', '#341aa0', '#921ca6', '#8ff099', '#14d8bc', '#be89e2', '#3f66ec', '#48aa83', '#2781a7', '#4d73d3', '#34b793', '#81e830', '#202696', '#b9b12f', '#ff2ef8', '#942999', '#d24a77', '#9bcf39', '#d3a7a4', '#24e933', '#a591c4', '#59a727', '#a2d753', '#0fc457', '#e65a16', '#452cec', '#62ef67', '#267423', '#18a184', '#c0c0a6', '#319a76', '#136c8f', '#0cb289', '#a4db50', '#28d327', '#42c2a8', '#bfefc9', '#532da8', '#957c26', '#dc9a60', '#aff249', '#86d566', '#8822f4', '#bcce13', '#211130', '#3534fd', '#6d7e96', '#4e81ec', '#889b62', '#416008', '#b7ecfb', '#41177d', '#ef4529', '#affe28', '#56c6e3', '#734134', '#7be384', '#92bb20', '#a5f0fb', '#6233d3', '#109dac', '#3bd172', '#3fce0b', '#964246', '#ffa00f', '#161c6a', '#8fd513', '#a52702', '#32ba8b', '#addcf8', '#95008a', '#718fa1', '#fc843a', '#f0606a', '#29cd59', '#17891d', '#8a8db2', '#97a5c8', '#8dda2f', '#b4e8fa', '#f9dcd0', '#e3c781', '#c35f1a', '#4aded4', '#051a46', '#091bdf', '#31fd7d', '#7df9b3', '#61e1ba', '#4fd42b', '#3693ab', '#3baf33', '#7d0a08', '#c208cf', '#f31a76', '#9df021', '#79c373', '#91556a', '#73eea3', '#b71982', '#a00f3e', '#4d0419', '#b3b6e9', '#55a515', '#04dc76', '#f22129', '#1c33a9', '#512bd3', '#38b950', '#a6dfb6', '#2fa5e8', '#c31bfa', '#38d67a', '#62609b', '#ddf2fb', '#96569c', '#cbe281', '#7cd3cb', '#9b5b09', '#e87f90', '#8dfca6', '#7a976f', '#e00119', '#6fc8e7', '#37b8d4', '#386cfc', '#ee21b0', '#b318ae', '#e3872e', '#c6b47c', '#cca5f4', '#c31ac0', '#036ec1', '#18d9ca', '#1e7905', '#d57657', '#7722e8', '#c5d9ba', '#d579ff', '#158fe6', '#7fe0b4', '#e7468a', '#ee7c46', '#22a8f2', '#b49cd0', '#61105e', '#13c7bd', '#9b8943', '#64ab2c', '#eacd81', '#d3a77f', '#71ad2c', '#7fc956', '#6a67dd', '#9b588a', '#996d03', '#87ea01', '#7e03ed', '#11c0c8', '#fab52a', '#7112b4', '#56de92', '#7ef388', '#dbce73', '#471110', '#d24510', '#218646', '#244454', '#344af8', '#44a97d', '#8cccb3', '#bd260f', '#3a8b2e', '#e8741e', '#23da4c', '#c6ac12', '#cdb38d', '#f66a17', '#8d73f6', '#fa174e', '#76a698', '#d044c3', '#36ca1f', '#f7336c', '#4c8c63', '#b86a66', '#ce5eee', '#3c3577', '#46e5c9', '#c48b3e', '#30838d', '#db06c1', '#ee95cc', '#301b0a', '#7570e5', '#e85692', '#c982df', '#d7dad0', '#45cb32', '#8cd3cd', '#931e77', '#c7cb58', '#458515', '#03f3af', '#d4b723', '#280b0d', '#52c74b', '#3d29b5', '#b35cda', '#3a5ed7', '#3e7b08', '#fcd4dc', '#54ec96', '#59bf6e', '#bc9b1b', '#596e42', '#b50ba3', '#8a6bb1', '#a20f07', '#16b74f', '#1501e2', '#b90b33', '#bd62f9', '#d3f85a', '#af772d', '#899914', '#1ad54e', '#188dcd', '#a034ef', '#80bfd3', '#f66941', '#7d6b0c', '#1fa309', '#c1ec36', '#f3c0fc', '#bcb63f', '#14c88b', '#a6eb2e', '#3db59a', '#7f3190', '#15341e', '#b09537', '#060e34', '#7e62b9', '#0baa48', '#5ff2f7', '#068199', '#531d0d', '#39e257', '#694481', '#8ea35d', '#25e9ed', '#cb1ef2', '#c0775f', '#a08af5', '#71e27b', '#49d1b3', '#73e334', '#7c9166', '#295a8a', '#93187f', '#3d926f', '#1701fd', '#5fd45a', '#892b5e', '#845f70', '#ed22a4', '#6afcbf', '#5efbc0', '#3a8f2b', '#5aa048', '#211b25', '#a1ac23', '#2ccf6b', '#b56d45', '#a9025f', '#caccbd', '#a41690', '#aee963', '#c637f6', '#b9b888', '#201d02', '#4b1149', '#9f07b4', '#4dd81a', '#61dc40', '#ae78c0', '#ffa1eb', '#8954ee', '#f3c39d', '#b1db29', '#76a250', '#e6c881', '#4a8f7a', '#a792f2', '#31b9ff', '#aeeda4', '#22331b', '#4dfa61', '#ffd40b', '#472817', '#3c3ec0', '#50e025', '#540351', '#ddfa14', '#d567d8', '#f5a030', '#e34298', '#0112e6', '#77b88f', '#34be77', '#189c9c', '#7c0d61', '#cbf94d', '#8f4ea2', '#675c68', '#71f015', '#5dd73c', '#25719d', '#fd3f7a', '#45e802', '#d0568e', '#ac5ac6', '#0e202e', '#6ccc11', '#b544b8', '#1e49a0', '#f2495e', '#dbfd3e', '#1c1dc7', '#3b3ac2', '#c58c44', '#d8abd5', '#41ceb5', '#d48896', '#a7719d', '#ee792f', '#47a60e', '#2e446c', '#245432', '#caf47e', '#2fdae0', '#d53b3b', '#74e568', '#831740', '#169c0b', '#70f288', '#b6161b', '#fbc632', '#6083a6', '#2fad64', '#f79133', '#2d6d63', '#a34f5a', '#ad8b37', '#e3c0a1', '#62a6e1', '#5d3a1e', '#fd4fa1', '#6f0799', '#ee6f6b', '#08a045', '#48d140', '#28bf22', '#c224f2', '#017c4e', '#9f64b0', '#1f95af', '#a577eb', '#b7c0e0', '#682c6b', '#898d97', '#8a8568', '#0b7df3', '#1fcbd4', '#61571d', '#ce3f36', '#61137c', '#2143be', '#34341b', '#536822', '#20a28d', '#9095c8', '#d7c340', '#3ef225', '#025a5d', '#de8c2b']
cls['colour'] = colour
fif = pd.merge(fif, cls, on='connections')
fif
def nodify(node_names):
node_names = unique_list
# uniqe name begginings
ends = sorted(list(set([e[0] for e in node_names])))
# intervals
steps = 1/4
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
# x and y values in list form
x_values = [nodes_x[n[0]] for n in node_names]
y_values = [x*0.03 for x in range(1, len(x_values))]
return x_values, y_values
####### GET SOME SIGNIFICANT PATHS, options occuring together
nfif = fif[fif['counts'] > 1]
nfif
pax = pd.DataFrame(nndf).reset_index()
pax.id = 1
pax.drop('index', axis=1, inplace=True)
pax = pax.groupby('Option')['id'].sum().reset_index()
pax.columns = [3, 'id']
nxn = pd.merge(nfif, pax, on=3)
pax.columns = [4, 'idx']
rnxn = pd.merge(nxn, pax, on=4)
rnxn['p1'] = rnxn['counts']/rnxn['id']
rnxn['p2'] = rnxn['counts']/rnxn['idx']
rnxn['p1p2'] = rnxn['p1']*rnxn['p2']
rnxn = rnxn[rnxn['p1p2'] >= .05]
rnxn.sort_values(['p1p2'], ascending=False, inplace=True)
rnxn.head(20)
| 1 | 2 | counts | 3 | 4 | label | connections | colour | id | idx | p1 | p2 | p1p2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 84 | 1378 | 1374 | 139 | Spouse or partner - Never | Future spouse or partner - Never | Spouse or partner - Never Future spouse or par... | 1378 1374 | #136c8f | 158 | 193 | 0.879747 | 0.720207 | 0.633600 |
| 318 | 1420 | 1414 | 200 | Strongly agree to Revising and Updating | Strongly agree to Policymaking | Strongly agree to Revising and Updating Strong... | 1420 1414 | #898d97 | 231 | 278 | 0.865801 | 0.719424 | 0.622878 |
| 312 | 1418 | 1413 | 33 | Strongly disagree to Revising and Updating | Strongly disagree to Policymaking | Strongly disagree to Revising and Updating Str... | 1418 1413 | #6f0799 | 41 | 45 | 0.804878 | 0.733333 | 0.590244 |
| 322 | 1419 | 1415 | 254 | Agree to Revising and Updating | Agree to Policymaking | Agree to Revising and Updating Agree to Policy... | 1419 1415 | #1f95af | 353 | 318 | 0.719547 | 0.798742 | 0.574732 |
| 138 | 1388 | 1384 | 282 | Other relatives - Never | Siblings - Never | Other relatives - Never Siblings - Never | 1388 1384 | #c208cf | 416 | 334 | 0.677885 | 0.844311 | 0.572346 |
| 227 | 1400 | 1399 | 273 | Do not know whether the data will be stored se... | Do not know who will have access to that infor... | Do not know whether the data will be stored se... | 1400 1399 | #ee95cc | 343 | 383 | 0.795918 | 0.712794 | 0.567326 |
| 78 | 1377 | 1373 | 193 | Spouse or partner - Definitely | Future spouse or partner - Definitely | Spouse or partner - Definitely Future spouse o... | 1377 1373 | #62ef67 | 293 | 232 | 0.658703 | 0.831897 | 0.547973 |
| 10 | 1357 | 1356 | 409 | Low GK Confidence | Younger Participants | Low GK Confidence Younger Participants | 1357 1356 | #172ae0 | 519 | 599 | 0.788054 | 0.682805 | 0.538087 |
| 23 | 1363 | 1361 | 292 | Non Law Students | Students | Non Law Students Students | 1363 1361 | #fe3188 | 292 | 561 | 1.000000 | 0.520499 | 0.520499 |
| 299 | 1414 | 1410 | 196 | Strongly agree to Policymaking | Strongly agree to dissemination of GK | Strongly agree to Policymaking Strongly agree ... | 1414 1410 | #2e446c | 278 | 276 | 0.705036 | 0.710145 | 0.500678 |
| 272 | 1406 | 1405 | 334 | I am concerned my data will be used for other ... | I am worried some information about my physica... | I am concerned my data will be used for other ... | 1406 1405 | #b9b888 | 498 | 448 | 0.670683 | 0.745536 | 0.500018 |
| 68 | 1376 | 1372 | 276 | Spouse or partner - Under certain circumstances | Future spouse or partner - Under certain circu... | Spouse or partner - Under certain circumstance... | 1376 1372 | #d3a7a4 | 372 | 410 | 0.741935 | 0.673171 | 0.499449 |
| 142 | 1389 | 1385 | 369 | Other relatives - Under certain circumstances | Siblings - Under certain circumstances | Other relatives - Under certain circumstances ... | 1389 1385 | #04dc76 | 557 | 499 | 0.662478 | 0.739479 | 0.489888 |
| 5 | 1356 | 1353 | 379 | Younger Participants | Female Participants | Younger Participants Female Participants | 1356 1353 | #ceb2f9 | 599 | 497 | 0.632721 | 0.762575 | 0.482498 |
| 24 | 1364 | 1361 | 269 | Law Students | Students | Law Students Students | 1364 1361 | #a19fc5 | 269 | 561 | 1.000000 | 0.479501 | 0.479501 |
| 295 | 1415 | 1409 | 227 | Agree to Policymaking | Agree to dissemination of GK | Agree to Policymaking Agree to dissemination o... | 1415 1409 | #74e568 | 318 | 339 | 0.713836 | 0.669617 | 0.477997 |
| 18 | 1361 | 1360 | 269 | Students | Participants related to law | Students Participants related to law | 1361 1360 | #20aeba | 561 | 270 | 0.479501 | 0.996296 | 0.477725 |
| 156 | 1392 | 1388 | 365 | Friends - Never | Other relatives - Never | Friends - Never Other relatives - Never | 1392 1388 | #37b8d4 | 705 | 416 | 0.517730 | 0.877404 | 0.454259 |
| 58 | 1375 | 1371 | 277 | Spouse or partner - Most Likely | Future spouse or partner - Most Likely | Spouse or partner - Most Likely Future spouse ... | 1375 1371 | #81e830 | 391 | 454 | 0.708440 | 0.610132 | 0.432242 |
| 288 | 1413 | 1408 | 33 | Strongly disagree to Policymaking | Strongly disagree to dissemination of GK | Strongly disagree to Policymaking Strongly dis... | 1413 1408 | #c58c44 | 45 | 56 | 0.733333 | 0.589286 | 0.432143 |
### filter here for single counts
fif['counts'] = fif['counts'].map(int)
nfif = fif[fif['counts'] > 1]
### new plot
sources = list(nfif[1])
targets = list(nfif[2])
values = list(nfif['counts'])
labels = list(nfif['label'])
colours = list(nfif['colour'])
unique_list = nfif['label'].unique()
sources = sources
targets = targets
values = values
nodified = nodify(node_names=unique_list)
nodified
###
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 20,
thickness = 5,
line = dict(color = 'red', width = 1),
label = labels,
customdata = labels,
hovertemplate='Source has total value %{value}<extra></extra>',
color = 'blue',
),
link = dict(
source = sources, # indices correspond to labels, eg A1, A2, A2, B1, ...
target = targets,
value = values,
customdata = labels,
color = colours,
hovertemplate='Absolute count: %{value}'+
'<br />Option: %{customdata}<extra></extra>'
))])
go.Layout(title='Sankey plot',
#other options for the plot
hoverlabel=dict(font=dict(family='sans-serif', size=100)))
fig = fig.update_layout(margin=dict(t=100))
fig.write_html("/home/manu10/Downloads/iglas_work/SELECT_BIG_sankey.html")
fig.show()
### filter here for single counts
fif['counts'] = fif['counts'].map(int)
nfif = fif[fif['counts'] > 5]
### new plot
sources = list(nfif[1])
targets = list(nfif[2])
values = list(nfif['counts'])
labels = list(nfif['label'])
colours = list(nfif['colour'])
unique_list = nfif['label'].unique()
sources = sources
targets = targets
values = values
nodified = nodify(node_names=unique_list)
nodified
###
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 20,
thickness = 5,
line = dict(color = 'red', width = 1),
label = labels,
customdata = labels,
hovertemplate='Source has total value %{value}<extra></extra>',
color = 'blue',
),
link = dict(
source = sources, # indices correspond to labels, eg A1, A2, A2, B1, ...
target = targets,
value = values,
customdata = labels,
color = colours,
hovertemplate='Absolute count: %{value}'+
'<br />Option: %{customdata}<extra></extra>'
))])
go.Layout(title='Sankey plot',
#other options for the plot
hoverlabel=dict(font=dict(family='sans-serif', size=100)))
fig = fig.update_layout(margin=dict(t=100))
fig.write_html("/home/manu10/Downloads/iglas_work/5_FILTER_SELECT_BIG_sankey.html")
fig.show()
### filter here for single counts
fif['counts'] = fif['counts'].map(int)
nfif = fif[fif['counts'] > 10]
### new plot
sources = list(nfif[1])
targets = list(nfif[2])
values = list(nfif['counts'])
labels = list(nfif['label'])
colours = list(nfif['colour'])
unique_list = nfif['label'].unique()
sources = sources
targets = targets
values = values
nodified = nodify(node_names=unique_list)
nodified
###
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 20,
thickness = 5,
line = dict(color = 'red', width = 1),
label = labels,
customdata = labels,
hovertemplate='Source has total value %{value}<extra></extra>',
color = 'blue',
),
link = dict(
source = sources, # indices correspond to labels, eg A1, A2, A2, B1, ...
target = targets,
value = values,
customdata = labels,
color = colours,
hovertemplate='Absolute count: %{value}'+
'<br />Option: %{customdata}<extra></extra>'
))])
go.Layout(title='Sankey plot',
#other options for the plot
hoverlabel=dict(font=dict(family='sans-serif', size=100)))
fig = fig.update_layout(margin=dict(t=100))
fig.write_html("/home/manu10/Downloads/iglas_work/10_FILTER_SELECT_BIG_sankey.html")
fig.show()
### filter here for single counts
fif['counts'] = fif['counts'].map(int)
nfif = fif[fif['counts'] > 20]
### new plot
sources = list(nfif[1])
targets = list(nfif[2])
values = list(nfif['counts'])
labels = list(nfif['label'])
colours = list(nfif['colour'])
unique_list = nfif['label'].unique()
sources = sources
targets = targets
values = values
nodified = nodify(node_names=unique_list)
nodified
###
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 20,
thickness = 5,
line = dict(color = 'red', width = 1),
label = labels,
customdata = labels,
hovertemplate='Source has total value %{value}<extra></extra>',
color = 'blue',
),
link = dict(
source = sources, # indices correspond to labels, eg A1, A2, A2, B1, ...
target = targets,
value = values,
customdata = labels,
color = colours,
hovertemplate='Absolute count: %{value}'+
'<br />Option: %{customdata}<extra></extra>'
))])
go.Layout(title='Sankey plot',
#other options for the plot
hoverlabel=dict(font=dict(family='sans-serif', size=100)))
fig = fig.update_layout(margin=dict(t=100))
fig.write_html("/home/manu10/Downloads/iglas_work/20_FILTER_SELECT_BIG_sankey.html")
fig.show()
from pyvis.network import Network
from itertools import combinations
import networkx
import nxviz as nv
import matplotlib as mpl
mpl.style.use('classic')
df_graph = nrnxn
df_graph['From'] = df_graph[3].map(str)+' '+ df_graph['counts'].map(str)
df_graph['To'] = df_graph[4]
df_graph['Count'] = df_graph['counts_w']
colors=cls['colour']
weights = df_graph['p1p2']
G = networkx.from_pandas_edgelist(
df_graph, source="From", target="To", edge_attr="Count"
)
#################3333333
# dynamic node sizes
scale=1 # Scaling the size of the nodes by 10*degree
d = dict(G.degree)
#Updating dict
d.update((x, scale*y) for x, y in d.items())
####
plt.figure(figsize=(40,40))
plt.rcParams['figure.facecolor'] = 'white'
G = networkx.draw_networkx(G, edge_color=colors, node_color='blue',alpha=1, node_size=100,
width=weights*0.1, arrows= False, with_labels=True, font_size=6, font_family='sans-serif'
)
plt.tight_layout()
plt.savefig('5_filter_EVERYTHING.png', dpi=500)
/home/manu10/miniconda/envs/lda/lib/python3.9/site-packages/nxviz/__init__.py:18: UserWarning: nxviz has a new API! Version 0.7.4 onwards, the old class-based API is being deprecated in favour of a new API focused on advancing a grammar of network graphics. If your plotting code depends on the old API, please consider pinning nxviz at version 0.7.4, as the new API will break your old code. To check out the new API, please head over to the docs at https://ericmjl.github.io/nxviz/ to learn more. We hope you enjoy using it! (This deprecation message will go away in version 1.0.)
select= ['77', '24', '27', '8']
nndf = BNdf[BNdf['Group'].isin(select)]
select = [
'0 Low confidence Confidence profile',
'0 High confident Confidence profile', '0 Non law Legal',
'0 Law Legal', '0 Student student', '0 Not student student',
'0 Other branch branch', '0 Not a student branch',
'0 Law branch branch', '0 Low concern', '0 Medium concern',
'0 High concern', '0 High curiosity', '0 Low curiosity',
'0 Medium curiosity'] # only keep age, and gk
nndf['Option'] = nndf['Option'].map(str)
nndf = nndf[~nndf['Option'].isin(select)]
#cps['Option'] = cps['Option']+' '+cps['Description']
sources = nndf[['id', 'Option']].copy()
len_options = len(nndf.Option.unique())
len_options
len_ids = len(nndf.id.unique()) +1
len_ids
ranges = list(range(len_ids, len_ids+len_options))
len(ranges) == len(nndf.Option.unique())
options = nndf.Option.unique()
options
# get categorical codes
categories = dict(zip(options,ranges))
categories
sources['codes'] = sources['Option'].map(categories)
xtt=pd.DataFrame()
xtt = sources[['Option', 'codes']].copy()
# get source codes and counts
sources['codes'] = sources['codes'].map(str)
counts = sources.groupby(["id"])["codes"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+counts['codes'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
counts['xcodes'] = nx.iloc[:,2]
gcounts = sources.groupby(["id"])["Option"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+gcounts['Option'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
gcounts['xoption'] = nx.iloc[:,2]
gcounts
lel = pd.merge(counts, gcounts, on='id')
del lel['codes']
del lel['Option']
lel
# writing operations
wo = []
for i in range(len(counts['xcodes'])) :
wo.append(pd.Series(counts.iloc[i, 2]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
# value counts df
vc = pd.DataFrame(wo)
# counts
cxounts = pd.concat([lel, vc], axis=1)
lex = cxounts.set_index(['id','xcodes', 'xoption']).stack().reset_index()
lex['counts'] = lex[0]
lex['codes'] = lex['level_3']
del lex[0]
del lex['level_3']
# paths
lex['path'] = """'""" + lex["id"].astype(str)+"',"+lex["xcodes"]
lex['label'] = """'""" + lex["id"].astype(str)+"',"+lex["xoption"]
lex['path'] = lex['path'].str.replace("""'""", '')
lex['label'] = lex['label'].str.replace("""'""", '')
lex.head(2)
lex["counts"] = lex["counts"].map(int)
## paths and sources
path_list = list(lex.path.unique())
label_list = list(lex.xoption.unique())
############################################## corrected code
def zigzag(seq):
"""Return two sequences with alternating elements from `seq`"""
seq_int = [list(map(int, x)) for x in seq]
x = []
y = []
for i in seq_int:
for j, k in zip(i, i[1:]):
x.append(j)
y.append(k)
return list(zip(x, y))
# get a path graph
y = []
for i in range(len(path_list)):
y.append(list(path_list[i].split(',')))
big_list = zigzag(y)
#### MOST COMMON PATH
c_path = pd.DataFrame(big_list)
c_path = c_path[c_path[0].isin(ranges)] #remove the participant id initials
c_path[2] = c_path[0]
c_path[0] = '1'
c_path
########################## edit here
tagged = c_path.groupby([1, 2])[0].agg(lambda x: """','""".join(x[x != ''])).reset_index()
xtagged= ("""'"""+tagged[0].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
xtagged['counts'] = [len(x.split(',')) for x in xtagged[0].tolist()]
ztagged = pd.concat([tagged, xtagged], axis=1)
ztagged
####
inv_map = {str(v): str(k) for k, v in categories.items()}
fif = ztagged[[1, 2, 0, 'counts']]
fif[1] = fif[1].map(str)
fif[3] = fif[1].map(inv_map)
fif[2] = fif[2].map(str)
fif[4] = fif[2].map(inv_map)
del fif[0]
fif['label'] = fif[3] + ' ' + fif[4]
fif[1] = fif[1].map(int)
fif[2] = fif[2].map(int)
fif
fif['connections'] = fif.iloc[:,0].astype(str)+" "+fif.iloc[:,1].astype(str)
cls = pd.DataFrame()
cls['connections'] = pd.DataFrame(fif['connections'].unique())
import random
# generate random colours
amount = len(fif['connections'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
print(colour)
['#de60e9', '#1992ce', '#6029c3', '#383350', '#b40851', '#c667a9', '#066289', '#a95ea2', '#054c52', '#f0269c', '#71b5a4', '#e1cc1d', '#f7f464', '#1daad5', '#a47803', '#da1a49', '#b3efce', '#1e76a6', '#447fdd', '#7e87ad', '#c26b64', '#13dc61', '#2c69b3', '#179a94', '#45e2f7', '#0d57f7', '#e74243', '#41df12', '#20afca', '#44b5ed', '#3a04c9', '#784826', '#4c4434', '#45b8b9', '#63c40d', '#33cdec', '#bc7e38', '#2cf501', '#ab2a6c', '#7be2d4', '#6e6989', '#251b3c', '#e00129', '#ac774d', '#25b255', '#2c8b63', '#929b66', '#05e957', '#2aa3af', '#4def6f', '#46e164', '#5c9b28', '#384614', '#68340e', '#055364', '#160ebc', '#8078e9', '#0674cd', '#359c66', '#767041', '#4d07bf', '#4b73e6', '#73481b', '#0e0dc3', '#0c916d', '#6bee70', '#7c9abb', '#cf83ae', '#a76fd4', '#a6d77d', '#5519ab', '#8f7639', '#5ea2ba', '#a2537f', '#055719', '#b68e83', '#2380b8', '#1d6fad', '#8e2b32', '#09a713', '#c58a14', '#3b3204', '#f6745a', '#2ae7bd', '#d1bd34', '#ddc603', '#3fadf4', '#5d7480', '#0413af', '#5c500f', '#4f0ddc', '#54fb0b', '#790cbb', '#4b565f', '#7013e7', '#bd563a', '#4e30f9', '#06e715', '#7f475d', '#4976aa', '#3c0ef4', '#b7f473', '#823f0f', '#fcc56a', '#c59b8a', '#47c89b', '#6d9a4c', '#a39dcb', '#7798fa', '#fac3c4', '#bd0177', '#758f3b', '#369a00', '#643069', '#869cdb', '#4a032c', '#2d5b13', '#f5f966', '#0bf4df', '#9d3a93', '#8aba61', '#f5bac5', '#08df6c', '#8e0e68', '#fdc47d', '#48e0de', '#b6a410', '#0b3458', '#4fe6b4', '#7853a6', '#f497b0', '#514496', '#a46139', '#f3c1a6', '#8ac55c', '#95966a', '#c43c9d', '#9a2824', '#be3f52', '#ba0edd', '#70b68a', '#1635a9', '#c5cadd', '#76fbea', '#15fbec', '#ad68df', '#dbed9d', '#d57b64', '#cb9038', '#dbaff7', '#429e07', '#eba819', '#dd82fc', '#677f5f', '#d7eb89', '#7b5b35', '#ed7396', '#bde455', '#c92e9c', '#9dcfb9', '#9b6bf3', '#9abb0a', '#dbdb5c', '#91a716', '#7068c6', '#74b820', '#ef1ec5', '#315ec8', '#bc4706', '#1ebc3b', '#f832ea', '#77b423', '#badecf', '#de90fa', '#f8d832', '#ea3c5e', '#446e74', '#e2b31f', '#87b04a', '#f4cb70', '#353365', '#fc3a71', '#394ec1', '#b950d9', '#9156ae', '#121ca7', '#6fac73', '#ddbdb7', '#02a7ac', '#a17ec8', '#9e42e5', '#378ade', '#6f17c1', '#664068', '#73103e', '#8c5d2d', '#25854a', '#3a670f', '#f1c040', '#ea2c13', '#d8c66b', '#f54d07', '#6be0ac', '#d9553d', '#18dc6c', '#c5e022', '#66e2b5', '#cd5e60', '#d45d7a', '#b4549e', '#441db8', '#ba50ed', '#ea50fd', '#2d5433', '#4f9269', '#409e8b', '#e6b812', '#6104a0', '#f890a6', '#65427b', '#4862d6', '#90275f', '#267ab8', '#266b0b', '#ea46cb', '#fcf774', '#8e9449', '#9bdd04', '#9cbc47', '#3bb447', '#99f704', '#aa8639', '#b6ff74', '#3cc1f8', '#8afdc6', '#5907aa', '#63dcc6', '#2b1c6d', '#b7860c', '#ad99c3', '#7bcc76', '#4088c6', '#761526', '#d73b9f', '#50b911', '#c41344', '#97b2b9', '#af23f4', '#8e2a6a', '#a8cebb', '#ff3a87', '#5306bb', '#d0b6a5', '#f8295d', '#788f7d', '#f51d75', '#5cf0ce', '#8d79d6', '#82de0d', '#48b65b', '#d00028', '#acc549', '#e15d3d', '#5b9d8f', '#5968bb', '#3e242b', '#005483', '#cb53c6', '#6dcc25', '#ca5273', '#fe29ff', '#61318e', '#67ddbd', '#fcba39', '#e1dc9c', '#97e0c6', '#359128', '#b425b6', '#0f43be', '#6ec3a2', '#33dfb0', '#b3a22a', '#cc20fc', '#5590dd', '#89bcff', '#e899a2', '#514bdb', '#e30550', '#f7c21c', '#3f8960', '#4205a1', '#e0fb15', '#8695b4', '#edbbb2', '#1e7b7c', '#f73413', '#d80188', '#cb3cd4', '#1fd639', '#dac5ad', '#9d6c05', '#d2a42d', '#2b0770', '#649ea5', '#65680a', '#5ac701', '#350794', '#25f675', '#2bd555', '#7fdbd6', '#7cd9ee', '#282d2f', '#323806', '#169738', '#4c0027', '#e9bce2', '#3fb9d8', '#87c24a', '#502473', '#8f5009', '#4f4386', '#ceaa3b', '#6a6d54', '#fe02d0', '#598e3e', '#5e90e4', '#9ebd32', '#b94909', '#43ba4f', '#ff0ecf', '#a3ae88', '#587e4d', '#b6303d', '#3e995a', '#0b816f', '#e4544a', '#bf187a', '#09ae57', '#d9b6f6', '#57da64', '#a602f6', '#7ab6ca', '#d7f150', '#70cb45', '#1d4cc7', '#f89391', '#2dfc36', '#5cee97', '#3bcbed', '#c3ca31', '#8fdf74', '#eee43a', '#6ad76e', '#44de13', '#0fa8a9', '#eeab34', '#7de5aa', '#7bf50d', '#6d830c', '#5e4e9c', '#b38f0f', '#b3de01', '#16fb59']
cls['colour'] = colour
fif = pd.merge(fif, cls, on='connections')
fif
def nodify(node_names):
node_names = unique_list
# uniqe name begginings
ends = sorted(list(set([e[0] for e in node_names])))
# intervals
steps = 1/4
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
# x and y values in list form
x_values = [nodes_x[n[0]] for n in node_names]
y_values = [x*0.03 for x in range(1, len(x_values))]
return x_values, y_values
### filter here for single counts
fif['counts'] = fif['counts'].map(int)
nfif = fif[fif['counts'] > 10]
### new plot
sources = list(nfif[1])
targets = list(nfif[2])
values = list(nfif['counts'])
labels = list(nfif['label'])
colours = list(nfif['colour'])
unique_list = nfif['label'].unique()
sources = sources
targets = targets
values = values
nodified = nodify(node_names=unique_list)
nodified
###
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 20,
thickness = 5,
line = dict(color = 'red', width = 1),
label = labels,
customdata = labels,
hovertemplate='Source has total value %{value}<extra></extra>',
color = 'blue',
),
link = dict(
source = sources, # indices correspond to labels, eg A1, A2, A2, B1, ...
target = targets,
value = values,
customdata = labels,
color = colours,
hovertemplate='Absolute count: %{value}'+
'<br />Option: %{customdata}<extra></extra>'
))])
go.Layout(title='Sankey plot',
#other options for the plot
hoverlabel=dict(font=dict(family='sans-serif', size=100)))
fig = fig.update_layout(margin=dict(t=100))
fig.write_html("/home/manu10/Downloads/iglas_work/10_FILTER_SELECT_Curiosity_concern_endeavours_sankey.html")
fig.show()
from pyvis.network import Network
from itertools import combinations
import networkx
import nxviz as nv
import matplotlib as mpl
mpl.style.use('classic')
df_graph = fif[fif['counts'] > 5]
df_graph['From'] = df_graph[3].map(str)+' '+ df_graph['counts'].map(str)
df_graph['To'] = df_graph[4]
df_graph['Count'] = df_graph['counts']
colors=cls['colour']
weights = df_graph['counts']
G = networkx.from_pandas_edgelist(
df_graph, source="From", target="To", edge_attr="Count"
)
#################3333333
# dynamic node sizes
scale=1 # Scaling the size of the nodes by 10*degree
d = dict(G.degree)
#Updating dict
d.update((x, scale*y) for x, y in d.items())
####
plt.figure(figsize=(40,40))
plt.rcParams['figure.facecolor'] = 'white'
G = networkx.draw_networkx(G, edge_color=colors, node_color='blue',alpha=1, node_size=100,
width=weights*0.1, arrows= False, with_labels=True, font_size=6, font_family='sans-serif'
)
plt.tight_layout()
plt.savefig('10_filter_curious_concern_endeavours.png', dpi=500)
#### Top paths
paths = pd.DataFrame(y)
#paths = paths.drop(0, axis=1)
paths[0] = 1
paths.fillna(value='', inplace = True)
paths['path'] = paths[paths.columns[2:]].apply(
lambda x: ','.join(x.dropna().astype(str)),
axis=1
)
inv_map = {str(v): str(k) for k, v in categories.items()}
paths['name'] = paths[paths.columns[2:]].apply(
lambda x: ','.join(x.map(inv_map).dropna().astype(str)),
axis=1
)
paths['source'] = paths[1].map(inv_map)
npaths = paths.groupby(['source', 'path', 'name'])[0].sum().reset_index()
npaths = npaths[npaths[0] >1]
npaths['count'] = npaths[0]
npaths = npaths.sort_values(by='count', ascending=False)
npaths.head(n=10)
| source | path | name | 0 | count | |
|---|---|---|---|---|---|
| 209 | Future spouse or partner - Never | 1378,1382,1384,1388,1392,,,,,,,,,,,,,,,,, | Spouse or partner - Never,Children - Never,Sib... | 29 | 29 |
| 261 | Future spouse or partner - Under certain circu... | 1376,1380,1385,1389,1392,,,,,,,,,,,,,,,,, | Spouse or partner - Under certain circumstance... | 20 | 20 |
| 196 | Future spouse or partner - Never | 1378,1380,1384,1388,1392,,,,,,,,,,,,,,,,, | Spouse or partner - Never,Children - Under cer... | 18 | 18 |
| 259 | Future spouse or partner - Under certain circu... | 1376,1380,1385,1389,1391,,,,,,,,,,,,,,,,, | Spouse or partner - Under certain circumstance... | 15 | 15 |
| 251 | Future spouse or partner - Under certain circu... | 1376,1380,1384,1388,1392,,,,,,,,,,,,,,,,, | Spouse or partner - Under certain circumstance... | 14 | 14 |
| 15 | Future spouse or partner - Definitely | 1377,1379,1383,1387,1394,,,,,,,,,,,,,,,,, | Spouse or partner - Definitely,Children - Defi... | 13 | 13 |
| 275 | Future spouse or partner - Under certain circu... | 1376,1381,1385,1389,1392,,,,,,,,,,,,,,,,, | Spouse or partner - Under certain circumstance... | 10 | 10 |
| 210 | Future spouse or partner - Never | 1378,1382,1384,1388,1392,1395,,,,,,,,,,,,,,,, | Spouse or partner - Never,Children - Never,Sib... | 9 | 9 |
| 105 | Future spouse or partner - Most Likely | 1375,1381,1385,1389,1391,,,,,,,,,,,,,,,,, | Spouse or partner - Most Likely,Children - Mos... | 9 | 9 |
| 203 | Future spouse or partner - Never | 1378,1380,1385,1389,1392,,,,,,,,,,,,,,,,, | Spouse or partner - Never,Children - Under cer... | 8 | 8 |
#render dataframe as html
html = npaths.to_html()
#write html to file
text_file = open("PATHS_1_filter_curious_concern_endeavours.html", "w")
text_file.write(html)
text_file.close()
# map colours to categories
import random
# generate random colours
amount = len(npaths['name'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
print(colour)
['#1140b8', '#620cc8', '#656792', '#7e2d46', '#3fa784', '#9b1ea2', '#75b668', '#1206cd', '#53d7e7', '#a95f80', '#34e17a', '#1ce116', '#3d7c14', '#605dcc', '#6c6d51', '#d815dc', '#cd8e5b', '#f31db9', '#493492', '#8e7628', '#dc9f19', '#d9eb6a', '#58bc4f', '#57e420', '#9a2128', '#7b65f6', '#234986', '#fce645', '#7a6e06', '#841848', '#d5b220', '#52bde9', '#320f95', '#65dd61', '#cdf9b9', '#7a9d44', '#b2f310', '#8b0f28', '#1022ee', '#0bf294', '#aee627', '#13e9c4', '#d1e0c2', '#2f42c1', '#c11380', '#4c2847', '#9668a0', '#f69a71', '#3f6ee2', '#d901b2', '#0f7034', '#76a089', '#0b8a81', '#e4e732', '#df4384', '#64b106', '#1bab41', '#496dd4', '#6fe356', '#4194fd', '#32d33a', '#7c8aca', '#1fd082', '#599ad7', '#cbef06', '#eac39e', '#6102e0', '#01fa74', '#440f18', '#806c49', '#922942', '#991d8e', '#0d8552', '#829f79', '#ab1545', '#438a00', '#2b75d3', '#224bea', '#f2854f']
colour = colour
from pyvis.network import Network
from itertools import combinations
import networkx
import nxviz as nv
import matplotlib as mpl
mpl.style.use('classic')
df_graph = npaths
df_graph['From'] = df_graph['source'].map(str)+' '+ df_graph['count'].map(str)
df_graph['To'] = df_graph['name']
df_graph['Count'] = df_graph['count']
colors=colour
weights = df_graph['count']
G = networkx.from_pandas_edgelist(
df_graph, source="From", target="To", edge_attr="Count"
)
#################3333333
# dynamic node sizes
scale=3 # Scaling the size of the nodes by 10*degree
d = dict(G.degree)
#Updating dict
d.update((x, scale*y) for x, y in d.items())
####
plt.figure(figsize=(20,20))
plt.rcParams['figure.facecolor'] = 'white'
G = networkx.draw_networkx(G, pos = networkx.spring_layout(G), edge_color=colors, node_color='blue',alpha=1, node_size=100,
width=weights*0.1, arrows= False, with_labels=True, font_size=8, font_family='sans-serif'
)
plt.tight_layout()
#plt.savefig('PATHS_1_filter_curious_concern_endeavours.png', dpi=300)
dissemination - policymaking policymaking - revising and updating revising and updating - dissemination
select= ['3', '4', '5']
nndf = BNdf[BNdf['Group'].isin(select)]
#cps['Option'] = cps['Option']+' '+cps['Description']
sources = nndf[['id', 'Option']].copy()
len_options = len(nndf.Option.unique())
len_options
len_ids = len(nndf.id.unique()) +1
len_ids
ranges = list(range(len_ids, len_ids+len_options))
len(ranges) == len(nndf.Option.unique())
options = nndf.Option.unique()
options
# get categorical codes
categories = dict(zip(options,ranges))
categories
sources['codes'] = sources['Option'].map(categories)
xtt=pd.DataFrame()
xtt = sources[['Option', 'codes']].copy()
# get source codes and counts
sources['codes'] = sources['codes'].map(str)
counts = sources.groupby(["id"])["codes"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+counts['codes'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
counts['xcodes'] = nx.iloc[:,2]
gcounts = sources.groupby(["id"])["Option"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+gcounts['Option'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
gcounts['xoption'] = nx.iloc[:,2]
gcounts
lel = pd.merge(counts, gcounts, on='id')
del lel['codes']
del lel['Option']
lel
# writing operations
wo = []
for i in range(len(counts['xcodes'])) :
wo.append(pd.Series(counts.iloc[i, 2]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
# value counts df
vc = pd.DataFrame(wo)
# counts
cxounts = pd.concat([lel, vc], axis=1)
lex = cxounts.set_index(['id','xcodes', 'xoption']).stack().reset_index()
lex['counts'] = lex[0]
lex['codes'] = lex['level_3']
del lex[0]
del lex['level_3']
# paths
lex['path'] = """'""" + lex["id"].astype(str)+"',"+lex["xcodes"]
lex['label'] = """'""" + lex["id"].astype(str)+"',"+lex["xoption"]
lex['path'] = lex['path'].str.replace("""'""", '')
lex['label'] = lex['label'].str.replace("""'""", '')
lex.head(2)
lex["counts"] = lex["counts"].map(int)
## paths and sources
path_list = list(lex.path.unique())
label_list = list(lex.xoption.unique())
############################################## corrected code
def zigzag(seq):
"""Return two sequences with alternating elements from `seq`"""
seq_int = [list(map(int, x)) for x in seq]
x = []
y = []
for i in seq_int:
for j, k in zip(i, i[1:]):
x.append(j)
y.append(k)
return list(zip(x, y))
# get a path graph
y = []
for i in range(len(path_list)):
y.append(list(path_list[i].split(',')))
big_list = zigzag(y)
#### MOST COMMON PATH
c_path = pd.DataFrame(big_list)
c_path = c_path[c_path[0].isin(ranges)] #remove the participant id initials
c_path[2] = c_path[0]
c_path[0] = '1'
c_path
########################## edit here
tagged = c_path.groupby([1, 2])[0].agg(lambda x: """','""".join(x[x != ''])).reset_index()
xtagged= ("""'"""+tagged[0].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
xtagged['counts'] = [len(x.split(',')) for x in xtagged[0].tolist()]
ztagged = pd.concat([tagged, xtagged], axis=1)
ztagged
####
inv_map = {str(v): str(k) for k, v in categories.items()}
fif = ztagged[[1, 2, 0, 'counts']]
fif[1] = fif[1].map(str)
fif[3] = fif[1].map(inv_map)
fif[2] = fif[2].map(str)
fif[4] = fif[2].map(inv_map)
del fif[0]
fif['label'] = fif[3] + ' ' + fif[4]
fif[1] = fif[1].map(int)
fif[2] = fif[2].map(int)
fif['connections'] = fif.iloc[:,0].astype(str)+" "+fif.iloc[:,1].astype(str)
cls = pd.DataFrame()
cls['connections'] = pd.DataFrame(fif['connections'].unique())
import random
# generate random colours
amount = len(fif['connections'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
print(colour)
['#c07a72', '#27f11c', '#2596c3', '#9559c6', '#2346c0', '#07fc4e', '#80fd23', '#8a64eb', '#1f0a3d', '#77f1d6', '#35bd9b', '#fa3e0d', '#d20880', '#a05d02', '#59a586', '#7bbd6b', '#65d117', '#da3ab2', '#bbb264', '#67fc93', '#792808', '#cff8d0', '#62011b', '#973e94', '#65f407', '#57b34f', '#92bec1', '#8b4926', '#313f23', '#de8c1d', '#49b3ae', '#748481', '#c6c619', '#d567a4', '#17e257', '#cb8491', '#bfa2fe', '#7a750e', '#02b8c8', '#570f0c', '#b9a234', '#fbe3d6', '#7fbcbb', '#c7d626', '#eb968b', '#3d1022', '#b1945b', '#eebc32', '#8352c8', '#ee4859', '#08740d', '#027724', '#e213e2', '#15de56', '#6fc34e', '#ba1e2b', '#f7fe54', '#4bb98b', '#1b4c87', '#5d40d3', '#5f6053', '#d94e86']
cls['colour'] = colour
fif = pd.merge(fif, cls, on='connections')
fif
| 1 | 2 | counts | 3 | 4 | label | connections | colour | |
|---|---|---|---|---|---|---|---|---|
| 0 | 761 | 766 | 1 | Strongly disagree to dissemination of GK | Strongly disagree to Policymaking | Strongly disagree to dissemination of GK Stron... | 761 766 | #c07a72 |
| 1 | 761 | 770 | 1 | Strongly disagree to dissemination of GK | Disagree to Policymaking | Strongly disagree to dissemination of GK Disag... | 761 770 | #27f11c |
| 2 | 761 | 773 | 1 | Strongly disagree to dissemination of GK | Strongly agree to Revising and Updating | Strongly disagree to dissemination of GK Stron... | 761 773 | #2596c3 |
| 3 | 762 | 764 | 1 | Agree to dissemination of GK | Neutral towards to dissemination of GK | Agree to dissemination of GK Neutral towards t... | 762 764 | #9559c6 |
| 4 | 762 | 769 | 1 | Agree to dissemination of GK | Neutral towards to Policymaking | Agree to dissemination of GK Neutral towards t... | 762 769 | #2346c0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 57 | 775 | 764 | 1 | Neutral towards to Revising and Updating | Neutral towards to dissemination of GK | Neutral towards to Revising and Updating Neutr... | 775 764 | #4bb98b |
| 58 | 775 | 767 | 12 | Neutral towards to Revising and Updating | Strongly agree to Policymaking | Neutral towards to Revising and Updating Stron... | 775 767 | #1b4c87 |
| 59 | 775 | 768 | 30 | Neutral towards to Revising and Updating | Agree to Policymaking | Neutral towards to Revising and Updating Agree... | 775 768 | #5d40d3 |
| 60 | 775 | 769 | 32 | Neutral towards to Revising and Updating | Neutral towards to Policymaking | Neutral towards to Revising and Updating Neutr... | 775 769 | #5f6053 |
| 61 | 775 | 770 | 7 | Neutral towards to Revising and Updating | Disagree to Policymaking | Neutral towards to Revising and Updating Disag... | 775 770 | #d94e86 |
62 rows × 8 columns
### filter here for single counts
fif['counts'] = fif['counts'].map(int)
nfif = fif[fif['counts'] > 5]
### new plot
sources = list(nfif[1])
targets = list(nfif[2])
values = list(nfif['counts'])
labels = list(nfif['label'])
colours = list(nfif['colour'])
unique_list = nfif['label'].unique()
sources = sources
targets = targets
values = values
nodified = nodify(node_names=unique_list)
nodified
###
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 20,
thickness = 5,
line = dict(color = 'red', width = 1),
label = labels,
customdata = labels,
hovertemplate='Source has total value %{value}<extra></extra>',
color = 'blue',
),
link = dict(
source = sources, # indices correspond to labels, eg A1, A2, A2, B1, ...
target = targets,
value = values,
customdata = labels,
color = colours,
hovertemplate='Absolute count: %{value}'+
'<br />Option: %{customdata}<extra></extra>'
))])
go.Layout(title='Sankey plot',
#other options for the plot
hoverlabel=dict(font=dict(family='sans-serif', size=100)))
fig = fig.update_layout(margin=dict(t=100))
fig.write_html("/home/manu10/Downloads/iglas_work/sankey_3_4_5.html")
fig.show()
def nodify(node_names):
node_names = unique_list
# uniqe name begginings
ends = sorted(list(set([e[0] for e in node_names])))
# intervals
steps = 1/4
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
# x and y values in list form
x_values = [nodes_x[n[0]] for n in node_names]
y_values = [x*0.03 for x in range(1, len(x_values))]
return x_values, y_values
# map colours to categories
import random
# generate random colours
amount = len(npaths['name'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
print(colour)
['#0bdb37', '#382aef', '#51347f', '#9abb3e', '#56656b', '#3d1079', '#26f8eb', '#e13a45', '#a9fc5d', '#28d7b6', '#0e783b', '#d069c4', '#cbaeb5', '#53027e', '#151652', '#636aef', '#f36db2', '#da625e', '#39ba28', '#e43433', '#bf5998', '#f403a3', '#2cda3b', '#3b8152', '#e2e499', '#515386', '#48995b', '#f2f610', '#bba55a', '#bf0a56', '#60bb2e', '#c46041', '#29556e', '#d66afc', '#7cfef6', '#c70d83', '#0886bb', '#5702e2', '#1a47f3', '#a6c776', '#ce7fa3', '#44e923', '#da0e03', '#a058b9', '#e39bd6', '#bb7d82', '#68645a', '#94126a', '#dc873e', '#03e259', '#f7a00d', '#23251d', '#474694', '#a4d0ce', '#d394e7', '#3a8182', '#f1f8f4', '#7ab307', '#70befd', '#b3d8b5', '#23ca6b', '#617459', '#95d3e8', '#038d0b', '#a0985e', '#93073a', '#6eb644', '#1dbd23', '#ac2d42', '#2450e9', '#8201e6', '#de646a', '#cf2338', '#2e95da', '#630560', '#7e48ac', '#80d75e', '#dc6a46', '#91f44f']
####### GET SOME SIGNIFICANT PATHS, options occuring together
nfif = fif[fif['counts'] > 1]
nfif
pax = pd.DataFrame(nndf).reset_index()
pax.id = 1
pax.drop('index', axis=1, inplace=True)
pax = pax.groupby('Option')['id'].sum().reset_index()
pax.columns = [3, 'id']
nxn = pd.merge(nfif, pax, on=3)
pax.columns = [4, 'idx']
rnxn = pd.merge(nxn, pax, on=4)
rnxn['p1'] = rnxn['counts']/rnxn['id']
rnxn['p2'] = rnxn['counts']/rnxn['idx']
rnxn['p1p2'] = rnxn['p1']*rnxn['p2']
#rnxn = rnxn[rnxn['p1p2'] >= .05]
rnxn.sort_values(['p1p2'], ascending=False, inplace=True)
rnxn
| 1 | 2 | counts | 3 | 4 | label | connections | colour | id | idx | p1 | p2 | p1p2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 30 | 773 | 767 | 200 | Strongly agree to Revising and Updating | Strongly agree to Policymaking | Strongly agree to Revising and Updating Strong... | 773 767 | #eebc32 | 231 | 278 | 0.865801 | 0.719424 | 0.622878 |
| 24 | 771 | 766 | 33 | Strongly disagree to Revising and Updating | Strongly disagree to Policymaking | Strongly disagree to Revising and Updating Str... | 771 766 | #bfa2fe | 41 | 45 | 0.804878 | 0.733333 | 0.590244 |
| 34 | 772 | 768 | 254 | Agree to Revising and Updating | Agree to Policymaking | Agree to Revising and Updating Agree to Policy... | 772 768 | #c7d626 | 353 | 318 | 0.719547 | 0.798742 | 0.574732 |
| 11 | 767 | 763 | 196 | Strongly agree to Policymaking | Strongly agree to dissemination of GK | Strongly agree to Policymaking Strongly agree ... | 767 763 | #da3ab2 | 278 | 276 | 0.705036 | 0.710145 | 0.500678 |
| 7 | 768 | 762 | 227 | Agree to Policymaking | Agree to dissemination of GK | Agree to Policymaking Agree to dissemination o... | 768 762 | #cff8d0 | 318 | 339 | 0.713836 | 0.669617 | 0.477997 |
| 0 | 766 | 761 | 33 | Strongly disagree to Policymaking | Strongly disagree to dissemination of GK | Strongly disagree to Policymaking Strongly dis... | 766 761 | #fa3e0d | 45 | 56 | 0.733333 | 0.589286 | 0.432143 |
| 44 | 774 | 770 | 17 | Disagree to Revising and Updating | Disagree to Policymaking | Disagree to Revising and Updating Disagree to ... | 774 770 | #ba1e2b | 38 | 39 | 0.447368 | 0.435897 | 0.195007 |
| 42 | 775 | 769 | 32 | Neutral towards to Revising and Updating | Neutral towards to Policymaking | Neutral towards to Revising and Updating Neutr... | 775 769 | #5f6053 | 83 | 68 | 0.385542 | 0.470588 | 0.181432 |
| 22 | 769 | 764 | 23 | Neutral towards to Policymaking | Neutral towards to dissemination of GK | Neutral towards to Policymaking Neutral toward... | 769 764 | #313f23 | 68 | 48 | 0.338235 | 0.479167 | 0.162071 |
| 19 | 770 | 765 | 12 | Disagree to Policymaking | Disagree to dissemination of GK | Disagree to Policymaking Disagree to dissemina... | 770 765 | #17e257 | 39 | 35 | 0.307692 | 0.342857 | 0.105495 |
| 6 | 767 | 762 | 70 | Strongly agree to Policymaking | Agree to dissemination of GK | Strongly agree to Policymaking Agree to dissem... | 767 762 | #65d117 | 278 | 339 | 0.251799 | 0.206490 | 0.051994 |
| 12 | 768 | 763 | 56 | Agree to Policymaking | Strongly agree to dissemination of GK | Agree to Policymaking Strongly agree to dissem... | 768 763 | #62011b | 318 | 276 | 0.176101 | 0.202899 | 0.035731 |
| 37 | 775 | 768 | 30 | Neutral towards to Revising and Updating | Agree to Policymaking | Neutral towards to Revising and Updating Agree... | 775 768 | #5d40d3 | 83 | 318 | 0.361446 | 0.094340 | 0.034099 |
| 29 | 772 | 767 | 56 | Agree to Revising and Updating | Strongly agree to Policymaking | Agree to Revising and Updating Strongly agree ... | 772 767 | #7fbcbb | 353 | 278 | 0.158640 | 0.201439 | 0.031956 |
| 39 | 772 | 769 | 24 | Agree to Revising and Updating | Neutral towards to Policymaking | Agree to Revising and Updating Neutral towards... | 772 769 | #eb968b | 353 | 68 | 0.067989 | 0.352941 | 0.023996 |
| 8 | 769 | 762 | 22 | Neutral towards to Policymaking | Agree to dissemination of GK | Neutral towards to Policymaking Agree to disse... | 769 762 | #92bec1 | 68 | 339 | 0.323529 | 0.064897 | 0.020996 |
| 17 | 768 | 765 | 14 | Agree to Policymaking | Disagree to dissemination of GK | Agree to Policymaking Disagree to disseminatio... | 768 765 | #65f407 | 318 | 35 | 0.044025 | 0.400000 | 0.017610 |
| 9 | 770 | 762 | 15 | Disagree to Policymaking | Agree to dissemination of GK | Disagree to Policymaking Agree to disseminatio... | 770 762 | #748481 | 39 | 339 | 0.384615 | 0.044248 | 0.017018 |
| 4 | 770 | 761 | 6 | Disagree to Policymaking | Strongly disagree to dissemination of GK | Disagree to Policymaking Strongly disagree to ... | 770 761 | #49b3ae | 39 | 56 | 0.153846 | 0.107143 | 0.016484 |
| 45 | 775 | 770 | 7 | Neutral towards to Revising and Updating | Disagree to Policymaking | Neutral towards to Revising and Updating Disag... | 775 770 | #d94e86 | 83 | 39 | 0.084337 | 0.179487 | 0.015137 |
| 21 | 768 | 764 | 15 | Agree to Policymaking | Neutral towards to dissemination of GK | Agree to Policymaking Neutral towards to disse... | 768 764 | #973e94 | 318 | 48 | 0.047170 | 0.312500 | 0.014741 |
| 43 | 772 | 770 | 13 | Agree to Revising and Updating | Disagree to Policymaking | Agree to Revising and Updating Disagree to Pol... | 772 770 | #3d1022 | 353 | 39 | 0.036827 | 0.333333 | 0.012276 |
| 13 | 769 | 763 | 15 | Neutral towards to Policymaking | Strongly agree to dissemination of GK | Neutral towards to Policymaking Strongly agree... | 769 763 | #8b4926 | 68 | 276 | 0.220588 | 0.054348 | 0.011988 |
| 15 | 766 | 765 | 4 | Strongly disagree to Policymaking | Disagree to dissemination of GK | Strongly disagree to Policymaking Disagree to ... | 766 765 | #59a586 | 45 | 35 | 0.088889 | 0.114286 | 0.010159 |
| 23 | 770 | 764 | 4 | Disagree to Policymaking | Neutral towards to dissemination of GK | Disagree to Policymaking Neutral towards to di... | 770 764 | #d567a4 | 39 | 48 | 0.102564 | 0.083333 | 0.008547 |
| 35 | 773 | 768 | 22 | Strongly agree to Revising and Updating | Agree to Policymaking | Strongly agree to Revising and Updating Agree ... | 773 768 | #8352c8 | 231 | 318 | 0.095238 | 0.069182 | 0.006589 |
| 32 | 775 | 767 | 12 | Neutral towards to Revising and Updating | Strongly agree to Policymaking | Neutral towards to Revising and Updating Stron... | 775 767 | #1b4c87 | 83 | 278 | 0.144578 | 0.043165 | 0.006241 |
| 36 | 774 | 768 | 8 | Disagree to Revising and Updating | Agree to Policymaking | Disagree to Revising and Updating Agree to Pol... | 774 768 | #15de56 | 38 | 318 | 0.210526 | 0.025157 | 0.005296 |
| 27 | 774 | 766 | 3 | Disagree to Revising and Updating | Strongly disagree to Policymaking | Disagree to Revising and Updating Strongly dis... | 774 766 | #027724 | 38 | 45 | 0.078947 | 0.066667 | 0.005263 |
| 3 | 769 | 761 | 4 | Neutral towards to Policymaking | Strongly disagree to dissemination of GK | Neutral towards to Policymaking Strongly disag... | 769 761 | #57b34f | 68 | 56 | 0.058824 | 0.071429 | 0.004202 |
| 18 | 769 | 765 | 3 | Neutral towards to Policymaking | Disagree to dissemination of GK | Neutral towards to Policymaking Disagree to di... | 769 765 | #de8c1d | 68 | 35 | 0.044118 | 0.085714 | 0.003782 |
| 41 | 774 | 769 | 3 | Disagree to Revising and Updating | Neutral towards to Policymaking | Disagree to Revising and Updating Neutral towa... | 774 769 | #6fc34e | 38 | 68 | 0.078947 | 0.044118 | 0.003483 |
| 31 | 774 | 767 | 6 | Disagree to Revising and Updating | Strongly agree to Policymaking | Disagree to Revising and Updating Strongly agr... | 774 767 | #e213e2 | 38 | 278 | 0.157895 | 0.021583 | 0.003408 |
| 40 | 773 | 769 | 6 | Strongly agree to Revising and Updating | Neutral towards to Policymaking | Strongly agree to Revising and Updating Neutra... | 773 769 | #ee4859 | 231 | 68 | 0.025974 | 0.088235 | 0.002292 |
| 1 | 767 | 761 | 5 | Strongly agree to Policymaking | Strongly disagree to dissemination of GK | Strongly agree to Policymaking Strongly disagr... | 767 761 | #7bbd6b | 278 | 56 | 0.017986 | 0.089286 | 0.001606 |
| 25 | 772 | 766 | 5 | Agree to Revising and Updating | Strongly disagree to Policymaking | Agree to Revising and Updating Strongly disagr... | 772 766 | #fbe3d6 | 353 | 45 | 0.014164 | 0.111111 | 0.001574 |
| 38 | 771 | 769 | 2 | Strongly disagree to Revising and Updating | Neutral towards to Policymaking | Strongly disagree to Revising and Updating Neu... | 771 769 | #570f0c | 41 | 68 | 0.048780 | 0.029412 | 0.001435 |
| 2 | 768 | 761 | 5 | Agree to Policymaking | Strongly disagree to dissemination of GK | Agree to Policymaking Strongly disagree to dis... | 768 761 | #792808 | 318 | 56 | 0.015723 | 0.089286 | 0.001404 |
| 10 | 766 | 763 | 4 | Strongly disagree to Policymaking | Strongly agree to dissemination of GK | Strongly disagree to Policymaking Strongly agr... | 766 763 | #a05d02 | 45 | 276 | 0.088889 | 0.014493 | 0.001288 |
| 20 | 767 | 764 | 4 | Strongly agree to Policymaking | Neutral towards to dissemination of GK | Strongly agree to Policymaking Neutral towards... | 767 764 | #bbb264 | 278 | 48 | 0.014388 | 0.083333 | 0.001199 |
| 5 | 766 | 762 | 3 | Strongly disagree to Policymaking | Agree to dissemination of GK | Strongly disagree to Policymaking Agree to dis... | 766 762 | #d20880 | 45 | 339 | 0.066667 | 0.008850 | 0.000590 |
| 16 | 767 | 765 | 2 | Strongly agree to Policymaking | Disagree to dissemination of GK | Strongly agree to Policymaking Disagree to dis... | 767 765 | #67fc93 | 278 | 35 | 0.007194 | 0.057143 | 0.000411 |
| 26 | 773 | 766 | 2 | Strongly agree to Revising and Updating | Strongly disagree to Policymaking | Strongly agree to Revising and Updating Strong... | 773 766 | #b1945b | 231 | 45 | 0.008658 | 0.044444 | 0.000385 |
| 14 | 770 | 763 | 2 | Disagree to Policymaking | Strongly agree to dissemination of GK | Disagree to Policymaking Strongly agree to dis... | 770 763 | #c6c619 | 39 | 276 | 0.051282 | 0.007246 | 0.000372 |
| 28 | 771 | 767 | 2 | Strongly disagree to Revising and Updating | Strongly agree to Policymaking | Strongly disagree to Revising and Updating Str... | 771 767 | #7a750e | 41 | 278 | 0.048780 | 0.007194 | 0.000351 |
| 33 | 771 | 768 | 2 | Strongly disagree to Revising and Updating | Agree to Policymaking | Strongly disagree to Revising and Updating Agr... | 771 768 | #02b8c8 | 41 | 318 | 0.048780 | 0.006289 | 0.000307 |
from pyvis.network import Network
from itertools import combinations
import networkx
import nxviz as nv
import matplotlib as mpl
mpl.style.use('classic')
df_graph = rnxn
df_graph['From'] = df_graph[3].map(str)+' '+ ((df_graph['p1p2']*100).round(2)).map(str)
df_graph['To'] = df_graph[4]
df_graph['Count'] = df_graph['counts']
colors=cls['colour']
weights = df_graph['counts']
G = networkx.from_pandas_edgelist(
df_graph, source="From", target="To", edge_attr="Count"
)
#################3333333
# dynamic edge sizes
scale=100 # Scaling the size of the edges by 3 degrees
d = dict(G.degree)
#Updating dict
d.update((x, scale*y) for x, y in d.items())
####
plt.figure(figsize=(15,15))
plt.rcParams['figure.facecolor'] = 'white'
graph_pos = networkx.spring_layout(G)
G_3 = networkx.draw_networkx(G, pos = networkx.nx_pydot.graphviz_layout(G), edge_color=colors, node_color='blue',alpha=1,
width=weights*0.1, arrows= False, with_labels=True, font_size=10, font_family='sans-serif'
)
plt.tight_layout()
plt.savefig('network_3_4_5.png', dpi=300)
from pyvis.network import Network
got_net = Network(height='1080px', width='100%', bgcolor='#ffffff', font_color='black', directed=False)
# set the physics layout of the network
# got_net.barnes_hut()
got_data = rnxn
got_data = got_data[got_data['p1p2'] >= 0.1]
sources = got_data[3]
targets = got_data[4]
weights_edges = got_data['p1p2'].round(3)
weights_n1 = got_data['p1'].round(3)
weights_n2 = got_data['p2'].round(3)
colours = got_data['colour']
edge_data = zip(sources, targets, weights_edges, weights_n1, weights_n2, colours)
for e in edge_data:
src = e[0]
dst = e[1]
we = e[2]
wn1 = e[3]
wn2 = e[4]
c = e[5]
got_net.add_node(src, src, title=src, value=wn1, color=c)
got_net.add_node(dst, dst, title=dst, value=wn2, color=c)
got_net.add_edge(src, dst, value=we, color=c)
neighbor_map = got_net.get_adj_list()
edges = got_net.get_edges()
nodes=got_net.get_nodes()
N_nodes=len(nodes)
N_edges=len(edges)
weights=[[] for i in range(N_nodes)]
#Associating weights to neighbors
for i in range(N_nodes): #Loop through nodes
for neighbor in neighbor_map[nodes[i]]: #and neighbors
for j in range(N_edges): #associate weights to the edge between node and neighbor
if (edges[j]['from']==nodes[i] and edges[j]['to']==neighbor) or \
(edges[j]['from']==neighbor and edges[j]['to']==nodes[i]):
weights[i].append(edges[j]['value'])
for node,i in zip(got_net.nodes,range(N_nodes)):
node['value']=len(neighbor_map[node['id']])
node['weight']=[str(weights[i][k]) for k in range(len(weights[i]))]
list_neighbor=list(neighbor_map[node['id']])
#Concatenating neighbors and weights
hover_str=[list_neighbor[k]+' '+ node['weight'][k] for k in range(node['value'])]
#Setting up node title for hovering
node['title']+=' Neighbors:<br>'+'<br>'.join(hover_str)
got_net.show_buttons(filter_=['physics'])
got_net.show('allnet_network_3_4_5.html')
###### ALTERNATIVE METHOD, WITHOUT ZIGZAG - TOP PATHS
xor = pd.DataFrame(y).reset_index()
del xor['index']
del xor[0]
all_columns = list(xor.columns)
xor['count'] = 1
xor = xor.groupby(all_columns)['count'].sum().reset_index()
#xor = xor[xor['count'] > 1]
xor
nxor = xor[all_columns].copy()
for column in all_columns:
nxor[column] = nxor[column].map(str)
nxor[column] = nxor[column].map(inv_map)
nxor
one_xor = pd.concat([xor, nxor], axis=1)
one_xor.sort_values(['count'], ascending=False, inplace=True)
#one_xor[one_xor['count'] > 1]
one_xor
| 1 | 2 | 3 | count | 1 | 2 | 3 | |
|---|---|---|---|---|---|---|---|
| 22 | 762 | 768 | 772 | 185 | Agree to dissemination of GK | Agree to Policymaking | Agree to Revising and Updating |
| 37 | 763 | 767 | 773 | 149 | Strongly agree to dissemination of GK | Strongly agree to Policymaking | Strongly agree to Revising and Updating |
| 19 | 762 | 767 | 773 | 45 | Agree to dissemination of GK | Strongly agree to Policymaking | Strongly agree to Revising and Updating |
| 40 | 763 | 768 | 772 | 43 | Strongly agree to dissemination of GK | Agree to Policymaking | Agree to Revising and Updating |
| 36 | 763 | 767 | 772 | 38 | Strongly agree to dissemination of GK | Strongly agree to Policymaking | Agree to Revising and Updating |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1 | 761 | 766 | 772 | 1 | Strongly disagree to dissemination of GK | Strongly disagree to Policymaking | Agree to Revising and Updating |
| 34 | 763 | 766 | 772 | 1 | Strongly agree to dissemination of GK | Strongly disagree to Policymaking | Agree to Revising and Updating |
| 27 | 762 | 769 | 773 | 1 | Agree to dissemination of GK | Neutral towards to Policymaking | Strongly agree to Revising and Updating |
| 17 | 762 | 766 | 773 | 1 | Agree to dissemination of GK | Strongly disagree to Policymaking | Strongly agree to Revising and Updating |
| 6 | 761 | 767 | 774 | 1 | Strongly disagree to dissemination of GK | Strongly agree to Policymaking | Disagree to Revising and Updating |
73 rows × 7 columns
rnxn_triple = rnxn.copy()
xor_triple = one_xor.copy()
select= ['3', '4']
nndf = BNdf[BNdf['Group'].isin(select)]
#cps['Option'] = cps['Option']+' '+cps['Description']
sources = nndf[['id', 'Option']].copy()
len_options = len(nndf.Option.unique())
len_options
len_ids = len(nndf.id.unique()) +1
len_ids
ranges = list(range(len_ids, len_ids+len_options))
len(ranges) == len(nndf.Option.unique())
options = nndf.Option.unique()
options
# get categorical codes
categories = dict(zip(options,ranges))
categories
sources['codes'] = sources['Option'].map(categories)
xtt=pd.DataFrame()
xtt = sources[['Option', 'codes']].copy()
# get source codes and counts
sources['codes'] = sources['codes'].map(str)
counts = sources.groupby(["id"])["codes"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+counts['codes'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
counts['xcodes'] = nx.iloc[:,2]
gcounts = sources.groupby(["id"])["Option"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+gcounts['Option'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
gcounts['xoption'] = nx.iloc[:,2]
gcounts
lel = pd.merge(counts, gcounts, on='id')
del lel['codes']
del lel['Option']
lel
# writing operations
wo = []
for i in range(len(counts['xcodes'])) :
wo.append(pd.Series(counts.iloc[i, 2]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
# value counts df
vc = pd.DataFrame(wo)
# counts
cxounts = pd.concat([lel, vc], axis=1)
lex = cxounts.set_index(['id','xcodes', 'xoption']).stack().reset_index()
lex['counts'] = lex[0]
lex['codes'] = lex['level_3']
del lex[0]
del lex['level_3']
# paths
lex['path'] = """'""" + lex["id"].astype(str)+"',"+lex["xcodes"]
lex['label'] = """'""" + lex["id"].astype(str)+"',"+lex["xoption"]
lex['path'] = lex['path'].str.replace("""'""", '')
lex['label'] = lex['label'].str.replace("""'""", '')
lex.head(2)
lex["counts"] = lex["counts"].map(int)
## paths and sources
path_list = list(lex.path.unique())
label_list = list(lex.xoption.unique())
############################################## corrected code
def zigzag(seq):
"""Return two sequences with alternating elements from `seq`"""
seq_int = [list(map(int, x)) for x in seq]
x = []
y = []
for i in seq_int:
for j, k in zip(i, i[1:]):
x.append(j)
y.append(k)
return list(zip(x, y))
# get a path graph
y = []
for i in range(len(path_list)):
y.append(list(path_list[i].split(',')))
big_list = zigzag(y)
#### MOST COMMON PATH
c_path = pd.DataFrame(big_list)
c_path = c_path[c_path[0].isin(ranges)] #remove the participant id initials
c_path[2] = c_path[0]
c_path[0] = '1'
c_path
########################## edit here
tagged = c_path.groupby([1, 2])[0].agg(lambda x: """','""".join(x[x != ''])).reset_index()
xtagged= ("""'"""+tagged[0].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
xtagged['counts'] = [len(x.split(',')) for x in xtagged[0].tolist()]
ztagged = pd.concat([tagged, xtagged], axis=1)
ztagged
####
inv_map = {str(v): str(k) for k, v in categories.items()}
fif = ztagged[[1, 2, 0, 'counts']]
fif[1] = fif[1].map(str)
fif[3] = fif[1].map(inv_map)
fif[2] = fif[2].map(str)
fif[4] = fif[2].map(inv_map)
del fif[0]
fif['label'] = fif[3] + ' ' + fif[4]
fif[1] = fif[1].map(int)
fif[2] = fif[2].map(int)
fif
fif['connections'] = fif.iloc[:,0].astype(str)+" "+fif.iloc[:,1].astype(str)
cls = pd.DataFrame()
cls['connections'] = pd.DataFrame(fif['connections'].unique())
fif
| 1 | 2 | counts | 3 | 4 | label | connections | |
|---|---|---|---|---|---|---|---|
| 0 | 759 | 766 | 1 | Strongly disagree to dissemination of GK | Agree to Policymaking | Strongly disagree to dissemination of GK Agree... | 759 766 |
| 1 | 760 | 764 | 1 | Agree to dissemination of GK | Strongly disagree to Policymaking | Agree to dissemination of GK Strongly disagree... | 760 764 |
| 2 | 761 | 761 | 1 | Strongly agree to dissemination of GK | Strongly agree to dissemination of GK | Strongly agree to dissemination of GK Strongly... | 761 761 |
| 3 | 761 | 762 | 1 | Strongly agree to dissemination of GK | Neutral towards to dissemination of GK | Strongly agree to dissemination of GK Neutral ... | 761 762 |
| 4 | 761 | 765 | 1 | Strongly agree to dissemination of GK | Strongly agree to Policymaking | Strongly agree to dissemination of GK Strongly... | 761 765 |
| 5 | 761 | 768 | 1 | Strongly agree to dissemination of GK | Disagree to Policymaking | Strongly agree to dissemination of GK Disagree... | 761 768 |
| 6 | 764 | 759 | 33 | Strongly disagree to Policymaking | Strongly disagree to dissemination of GK | Strongly disagree to Policymaking Strongly dis... | 764 759 |
| 7 | 764 | 760 | 3 | Strongly disagree to Policymaking | Agree to dissemination of GK | Strongly disagree to Policymaking Agree to dis... | 764 760 |
| 8 | 764 | 761 | 4 | Strongly disagree to Policymaking | Strongly agree to dissemination of GK | Strongly disagree to Policymaking Strongly agr... | 764 761 |
| 9 | 764 | 763 | 4 | Strongly disagree to Policymaking | Disagree to dissemination of GK | Strongly disagree to Policymaking Disagree to ... | 764 763 |
| 10 | 765 | 759 | 5 | Strongly agree to Policymaking | Strongly disagree to dissemination of GK | Strongly agree to Policymaking Strongly disagr... | 765 759 |
| 11 | 765 | 760 | 70 | Strongly agree to Policymaking | Agree to dissemination of GK | Strongly agree to Policymaking Agree to dissem... | 765 760 |
| 12 | 765 | 761 | 196 | Strongly agree to Policymaking | Strongly agree to dissemination of GK | Strongly agree to Policymaking Strongly agree ... | 765 761 |
| 13 | 765 | 762 | 4 | Strongly agree to Policymaking | Neutral towards to dissemination of GK | Strongly agree to Policymaking Neutral towards... | 765 762 |
| 14 | 765 | 763 | 2 | Strongly agree to Policymaking | Disagree to dissemination of GK | Strongly agree to Policymaking Disagree to dis... | 765 763 |
| 15 | 766 | 759 | 5 | Agree to Policymaking | Strongly disagree to dissemination of GK | Agree to Policymaking Strongly disagree to dis... | 766 759 |
| 16 | 766 | 760 | 227 | Agree to Policymaking | Agree to dissemination of GK | Agree to Policymaking Agree to dissemination o... | 766 760 |
| 17 | 766 | 761 | 56 | Agree to Policymaking | Strongly agree to dissemination of GK | Agree to Policymaking Strongly agree to dissem... | 766 761 |
| 18 | 766 | 762 | 15 | Agree to Policymaking | Neutral towards to dissemination of GK | Agree to Policymaking Neutral towards to disse... | 766 762 |
| 19 | 766 | 763 | 14 | Agree to Policymaking | Disagree to dissemination of GK | Agree to Policymaking Disagree to disseminatio... | 766 763 |
| 20 | 767 | 759 | 4 | Neutral towards to Policymaking | Strongly disagree to dissemination of GK | Neutral towards to Policymaking Strongly disag... | 767 759 |
| 21 | 767 | 760 | 22 | Neutral towards to Policymaking | Agree to dissemination of GK | Neutral towards to Policymaking Agree to disse... | 767 760 |
| 22 | 767 | 761 | 15 | Neutral towards to Policymaking | Strongly agree to dissemination of GK | Neutral towards to Policymaking Strongly agree... | 767 761 |
| 23 | 767 | 762 | 23 | Neutral towards to Policymaking | Neutral towards to dissemination of GK | Neutral towards to Policymaking Neutral toward... | 767 762 |
| 24 | 767 | 763 | 3 | Neutral towards to Policymaking | Disagree to dissemination of GK | Neutral towards to Policymaking Disagree to di... | 767 763 |
| 25 | 768 | 759 | 6 | Disagree to Policymaking | Strongly disagree to dissemination of GK | Disagree to Policymaking Strongly disagree to ... | 768 759 |
| 26 | 768 | 760 | 15 | Disagree to Policymaking | Agree to dissemination of GK | Disagree to Policymaking Agree to disseminatio... | 768 760 |
| 27 | 768 | 761 | 2 | Disagree to Policymaking | Strongly agree to dissemination of GK | Disagree to Policymaking Strongly agree to dis... | 768 761 |
| 28 | 768 | 762 | 4 | Disagree to Policymaking | Neutral towards to dissemination of GK | Disagree to Policymaking Neutral towards to di... | 768 762 |
| 29 | 768 | 763 | 12 | Disagree to Policymaking | Disagree to dissemination of GK | Disagree to Policymaking Disagree to dissemina... | 768 763 |
import random
# generate random colours
amount = len(fif['connections'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
print(colour)
['#54eb4e', '#7d8b1e', '#a7b87e', '#f47556', '#8236bb', '#dc17cb', '#8a83ec', '#ea617f', '#363a5e', '#cb9c02', '#2e7466', '#f2100c', '#c52ce4', '#d477fd', '#cd95d6', '#51db77', '#f6a227', '#ebbd5b', '#56af68', '#a005eb', '#3f7045', '#3f4672', '#1e489e', '#919ba4', '#d8e9b0', '#6235ca', '#53c4a6', '#b09b9a', '#35e756', '#bae48b']
cls['colour'] = colour
fif = pd.merge(fif, cls, on='connections')
fif
| 1 | 2 | counts | 3 | 4 | label | connections | colour | |
|---|---|---|---|---|---|---|---|---|
| 0 | 759 | 766 | 1 | Strongly disagree to dissemination of GK | Agree to Policymaking | Strongly disagree to dissemination of GK Agree... | 759 766 | #54eb4e |
| 1 | 760 | 764 | 1 | Agree to dissemination of GK | Strongly disagree to Policymaking | Agree to dissemination of GK Strongly disagree... | 760 764 | #7d8b1e |
| 2 | 761 | 761 | 1 | Strongly agree to dissemination of GK | Strongly agree to dissemination of GK | Strongly agree to dissemination of GK Strongly... | 761 761 | #a7b87e |
| 3 | 761 | 762 | 1 | Strongly agree to dissemination of GK | Neutral towards to dissemination of GK | Strongly agree to dissemination of GK Neutral ... | 761 762 | #f47556 |
| 4 | 761 | 765 | 1 | Strongly agree to dissemination of GK | Strongly agree to Policymaking | Strongly agree to dissemination of GK Strongly... | 761 765 | #8236bb |
| 5 | 761 | 768 | 1 | Strongly agree to dissemination of GK | Disagree to Policymaking | Strongly agree to dissemination of GK Disagree... | 761 768 | #dc17cb |
| 6 | 764 | 759 | 33 | Strongly disagree to Policymaking | Strongly disagree to dissemination of GK | Strongly disagree to Policymaking Strongly dis... | 764 759 | #8a83ec |
| 7 | 764 | 760 | 3 | Strongly disagree to Policymaking | Agree to dissemination of GK | Strongly disagree to Policymaking Agree to dis... | 764 760 | #ea617f |
| 8 | 764 | 761 | 4 | Strongly disagree to Policymaking | Strongly agree to dissemination of GK | Strongly disagree to Policymaking Strongly agr... | 764 761 | #363a5e |
| 9 | 764 | 763 | 4 | Strongly disagree to Policymaking | Disagree to dissemination of GK | Strongly disagree to Policymaking Disagree to ... | 764 763 | #cb9c02 |
| 10 | 765 | 759 | 5 | Strongly agree to Policymaking | Strongly disagree to dissemination of GK | Strongly agree to Policymaking Strongly disagr... | 765 759 | #2e7466 |
| 11 | 765 | 760 | 70 | Strongly agree to Policymaking | Agree to dissemination of GK | Strongly agree to Policymaking Agree to dissem... | 765 760 | #f2100c |
| 12 | 765 | 761 | 196 | Strongly agree to Policymaking | Strongly agree to dissemination of GK | Strongly agree to Policymaking Strongly agree ... | 765 761 | #c52ce4 |
| 13 | 765 | 762 | 4 | Strongly agree to Policymaking | Neutral towards to dissemination of GK | Strongly agree to Policymaking Neutral towards... | 765 762 | #d477fd |
| 14 | 765 | 763 | 2 | Strongly agree to Policymaking | Disagree to dissemination of GK | Strongly agree to Policymaking Disagree to dis... | 765 763 | #cd95d6 |
| 15 | 766 | 759 | 5 | Agree to Policymaking | Strongly disagree to dissemination of GK | Agree to Policymaking Strongly disagree to dis... | 766 759 | #51db77 |
| 16 | 766 | 760 | 227 | Agree to Policymaking | Agree to dissemination of GK | Agree to Policymaking Agree to dissemination o... | 766 760 | #f6a227 |
| 17 | 766 | 761 | 56 | Agree to Policymaking | Strongly agree to dissemination of GK | Agree to Policymaking Strongly agree to dissem... | 766 761 | #ebbd5b |
| 18 | 766 | 762 | 15 | Agree to Policymaking | Neutral towards to dissemination of GK | Agree to Policymaking Neutral towards to disse... | 766 762 | #56af68 |
| 19 | 766 | 763 | 14 | Agree to Policymaking | Disagree to dissemination of GK | Agree to Policymaking Disagree to disseminatio... | 766 763 | #a005eb |
| 20 | 767 | 759 | 4 | Neutral towards to Policymaking | Strongly disagree to dissemination of GK | Neutral towards to Policymaking Strongly disag... | 767 759 | #3f7045 |
| 21 | 767 | 760 | 22 | Neutral towards to Policymaking | Agree to dissemination of GK | Neutral towards to Policymaking Agree to disse... | 767 760 | #3f4672 |
| 22 | 767 | 761 | 15 | Neutral towards to Policymaking | Strongly agree to dissemination of GK | Neutral towards to Policymaking Strongly agree... | 767 761 | #1e489e |
| 23 | 767 | 762 | 23 | Neutral towards to Policymaking | Neutral towards to dissemination of GK | Neutral towards to Policymaking Neutral toward... | 767 762 | #919ba4 |
| 24 | 767 | 763 | 3 | Neutral towards to Policymaking | Disagree to dissemination of GK | Neutral towards to Policymaking Disagree to di... | 767 763 | #d8e9b0 |
| 25 | 768 | 759 | 6 | Disagree to Policymaking | Strongly disagree to dissemination of GK | Disagree to Policymaking Strongly disagree to ... | 768 759 | #6235ca |
| 26 | 768 | 760 | 15 | Disagree to Policymaking | Agree to dissemination of GK | Disagree to Policymaking Agree to disseminatio... | 768 760 | #53c4a6 |
| 27 | 768 | 761 | 2 | Disagree to Policymaking | Strongly agree to dissemination of GK | Disagree to Policymaking Strongly agree to dis... | 768 761 | #b09b9a |
| 28 | 768 | 762 | 4 | Disagree to Policymaking | Neutral towards to dissemination of GK | Disagree to Policymaking Neutral towards to di... | 768 762 | #35e756 |
| 29 | 768 | 763 | 12 | Disagree to Policymaking | Disagree to dissemination of GK | Disagree to Policymaking Disagree to dissemina... | 768 763 | #bae48b |
### filter here for single counts
fif['counts'] = fif['counts'].map(int)
nfif = fif[fif['counts'] > 0]
### new plot
sources = list(nfif[1])
targets = list(nfif[2])
values = list(nfif['counts'])
labels = list(nfif['label'])
colours = list(nfif['colour'])
unique_list = nfif['label'].unique()
sources = sources
targets = targets
values = values
nodified = nodify(node_names=unique_list)
nodified
###
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 20,
thickness = 5,
line = dict(color = 'red', width = 1),
label = labels,
customdata = labels,
hovertemplate='Source has total value %{value}<extra></extra>',
color = 'blue',
),
link = dict(
source = sources, # indices correspond to labels, eg A1, A2, A2, B1, ...
target = targets,
value = values,
customdata = labels,
color = colours,
hovertemplate='Absolute count: %{value}'+
'<br />Option: %{customdata}<extra></extra>'
))])
go.Layout(title='Sankey plot',
#other options for the plot
hoverlabel=dict(font=dict(family='sans-serif', size=100)))
fig = fig.update_layout(margin=dict(t=100))
fig.write_html("/home/manu10/Downloads/iglas_work/sankey_3_4.html")
fig.show()
def nodify(node_names):
node_names = unique_list
# uniqe name begginings
ends = sorted(list(set([e[0] for e in node_names])))
# intervals
steps = 1/4
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
# x and y values in list form
x_values = [nodes_x[n[0]] for n in node_names]
y_values = [x*0.03 for x in range(1, len(x_values))]
return x_values, y_values
# map colours to categories
import random
# generate random colours
amount = len(npaths['name'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
print(colour)
['#18ddcb', '#d1d26c', '#0c0802', '#06ab5c', '#b10cc9', '#75d134', '#d4fe7f', '#ae21ee', '#6edf3c', '#74e3f1', '#b76365', '#a70c74', '#d56796', '#e2ff82', '#32ca52', '#529892', '#00e698', '#838d2b', '#7a571a', '#f76b77', '#4b2163', '#716561', '#8cc6f7', '#0f5221', '#f72ff8', '#765bdf', '#3f7b09', '#2bc58b', '#2a9fec', '#bfa96b', '#58f6ef', '#c9d333', '#b2183c', '#b34435', '#4242f4', '#d43db4', '#3e525d', '#1c9cb5', '#408ed0', '#6b6730', '#92db92', '#91797d', '#b23734', '#841729', '#f8f2ae', '#6486d5', '#fe2ba9', '#d8a75c', '#cbd6d7', '#2049c9', '#109cb1', '#c8d577', '#cc5f50', '#51d60b', '#dd6b4c', '#571614', '#22e8d3', '#0199df', '#cf5e04', '#bad303', '#f0018e', '#3f2a56', '#6e1d5b', '#b2bc44', '#2424b0', '#80ac8e', '#4280ef', '#e04d7a', '#a55461', '#4e20a0', '#39ff61', '#4a4297', '#2b3a2b', '#a6b5fb', '#2a46b1', '#86f8cc', '#c52350', '#46f9c1', '#27d78e']
####### GET SOME SIGNIFICANT PATHS, options occuring together
nfif = fif[fif['counts'] > 0]
nfif
pax = pd.DataFrame(nndf).reset_index()
pax.id = 1
pax.drop('index', axis=1, inplace=True)
pax = pax.groupby('Option')['id'].sum().reset_index()
pax.columns = [3, 'id']
nxn = pd.merge(nfif, pax, on=3)
pax.columns = [4, 'idx']
rnxn = pd.merge(nxn, pax, on=4)
rnxn['p1'] = rnxn['counts']/rnxn['id']
rnxn['p2'] = rnxn['counts']/rnxn['idx']
rnxn['p1p2'] = rnxn['p1']*rnxn['p2']
#rnxn = rnxn[rnxn['p1p2'] >= .05]
rnxn.sort_values(['p1p2'], ascending=False, inplace=True)
rnxn
| 1 | 2 | counts | 3 | 4 | label | connections | colour | id | idx | p1 | p2 | p1p2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 765 | 761 | 196 | Strongly agree to Policymaking | Strongly agree to dissemination of GK | Strongly agree to Policymaking Strongly agree ... | 765 761 | #c52ce4 | 278 | 276 | 0.705036 | 0.710145 | 0.500678 |
| 22 | 766 | 760 | 227 | Agree to Policymaking | Agree to dissemination of GK | Agree to Policymaking Agree to dissemination o... | 766 760 | #f6a227 | 318 | 339 | 0.713836 | 0.669617 | 0.477997 |
| 15 | 764 | 759 | 33 | Strongly disagree to Policymaking | Strongly disagree to dissemination of GK | Strongly disagree to Policymaking Strongly dis... | 764 759 | #8a83ec | 45 | 56 | 0.733333 | 0.589286 | 0.432143 |
| 11 | 767 | 762 | 23 | Neutral towards to Policymaking | Neutral towards to dissemination of GK | Neutral towards to Policymaking Neutral toward... | 767 762 | #919ba4 | 68 | 48 | 0.338235 | 0.479167 | 0.162071 |
| 29 | 768 | 763 | 12 | Disagree to Policymaking | Disagree to dissemination of GK | Disagree to Policymaking Disagree to dissemina... | 768 763 | #bae48b | 39 | 35 | 0.307692 | 0.342857 | 0.105495 |
| 21 | 765 | 760 | 70 | Strongly agree to Policymaking | Agree to dissemination of GK | Strongly agree to Policymaking Agree to dissem... | 765 760 | #f2100c | 278 | 339 | 0.251799 | 0.206490 | 0.051994 |
| 5 | 766 | 761 | 56 | Agree to Policymaking | Strongly agree to dissemination of GK | Agree to Policymaking Strongly agree to dissem... | 766 761 | #ebbd5b | 318 | 276 | 0.176101 | 0.202899 | 0.035731 |
| 23 | 767 | 760 | 22 | Neutral towards to Policymaking | Agree to dissemination of GK | Neutral towards to Policymaking Agree to disse... | 767 760 | #3f4672 | 68 | 339 | 0.323529 | 0.064897 | 0.020996 |
| 27 | 766 | 763 | 14 | Agree to Policymaking | Disagree to dissemination of GK | Agree to Policymaking Disagree to disseminatio... | 766 763 | #a005eb | 318 | 35 | 0.044025 | 0.400000 | 0.017610 |
| 24 | 768 | 760 | 15 | Disagree to Policymaking | Agree to dissemination of GK | Disagree to Policymaking Agree to disseminatio... | 768 760 | #53c4a6 | 39 | 339 | 0.384615 | 0.044248 | 0.017018 |
| 19 | 768 | 759 | 6 | Disagree to Policymaking | Strongly disagree to dissemination of GK | Disagree to Policymaking Strongly disagree to ... | 768 759 | #6235ca | 39 | 56 | 0.153846 | 0.107143 | 0.016484 |
| 10 | 766 | 762 | 15 | Agree to Policymaking | Neutral towards to dissemination of GK | Agree to Policymaking Neutral towards to disse... | 766 762 | #56af68 | 318 | 48 | 0.047170 | 0.312500 | 0.014741 |
| 6 | 767 | 761 | 15 | Neutral towards to Policymaking | Strongly agree to dissemination of GK | Neutral towards to Policymaking Strongly agree... | 767 761 | #1e489e | 68 | 276 | 0.220588 | 0.054348 | 0.011988 |
| 25 | 764 | 763 | 4 | Strongly disagree to Policymaking | Disagree to dissemination of GK | Strongly disagree to Policymaking Disagree to ... | 764 763 | #cb9c02 | 45 | 35 | 0.088889 | 0.114286 | 0.010159 |
| 12 | 768 | 762 | 4 | Disagree to Policymaking | Neutral towards to dissemination of GK | Disagree to Policymaking Neutral towards to di... | 768 762 | #35e756 | 39 | 48 | 0.102564 | 0.083333 | 0.008547 |
| 18 | 767 | 759 | 4 | Neutral towards to Policymaking | Strongly disagree to dissemination of GK | Neutral towards to Policymaking Strongly disag... | 767 759 | #3f7045 | 68 | 56 | 0.058824 | 0.071429 | 0.004202 |
| 28 | 767 | 763 | 3 | Neutral towards to Policymaking | Disagree to dissemination of GK | Neutral towards to Policymaking Disagree to di... | 767 763 | #d8e9b0 | 68 | 35 | 0.044118 | 0.085714 | 0.003782 |
| 16 | 765 | 759 | 5 | Strongly agree to Policymaking | Strongly disagree to dissemination of GK | Strongly agree to Policymaking Strongly disagr... | 765 759 | #2e7466 | 278 | 56 | 0.017986 | 0.089286 | 0.001606 |
| 17 | 766 | 759 | 5 | Agree to Policymaking | Strongly disagree to dissemination of GK | Agree to Policymaking Strongly disagree to dis... | 766 759 | #51db77 | 318 | 56 | 0.015723 | 0.089286 | 0.001404 |
| 3 | 764 | 761 | 4 | Strongly disagree to Policymaking | Strongly agree to dissemination of GK | Strongly disagree to Policymaking Strongly agr... | 764 761 | #363a5e | 45 | 276 | 0.088889 | 0.014493 | 0.001288 |
| 9 | 765 | 762 | 4 | Strongly agree to Policymaking | Neutral towards to dissemination of GK | Strongly agree to Policymaking Neutral towards... | 765 762 | #d477fd | 278 | 48 | 0.014388 | 0.083333 | 0.001199 |
| 20 | 764 | 760 | 3 | Strongly disagree to Policymaking | Agree to dissemination of GK | Strongly disagree to Policymaking Agree to dis... | 764 760 | #ea617f | 45 | 339 | 0.066667 | 0.008850 | 0.000590 |
| 26 | 765 | 763 | 2 | Strongly agree to Policymaking | Disagree to dissemination of GK | Strongly agree to Policymaking Disagree to dis... | 765 763 | #cd95d6 | 278 | 35 | 0.007194 | 0.057143 | 0.000411 |
| 7 | 768 | 761 | 2 | Disagree to Policymaking | Strongly agree to dissemination of GK | Disagree to Policymaking Strongly agree to dis... | 768 761 | #b09b9a | 39 | 276 | 0.051282 | 0.007246 | 0.000372 |
| 14 | 761 | 768 | 1 | Strongly agree to dissemination of GK | Disagree to Policymaking | Strongly agree to dissemination of GK Disagree... | 761 768 | #dc17cb | 276 | 39 | 0.003623 | 0.025641 | 0.000093 |
| 8 | 761 | 762 | 1 | Strongly agree to dissemination of GK | Neutral towards to dissemination of GK | Strongly agree to dissemination of GK Neutral ... | 761 762 | #f47556 | 276 | 48 | 0.003623 | 0.020833 | 0.000075 |
| 1 | 760 | 764 | 1 | Agree to dissemination of GK | Strongly disagree to Policymaking | Agree to dissemination of GK Strongly disagree... | 760 764 | #7d8b1e | 339 | 45 | 0.002950 | 0.022222 | 0.000066 |
| 0 | 759 | 766 | 1 | Strongly disagree to dissemination of GK | Agree to Policymaking | Strongly disagree to dissemination of GK Agree... | 759 766 | #54eb4e | 56 | 318 | 0.017857 | 0.003145 | 0.000056 |
| 2 | 761 | 761 | 1 | Strongly agree to dissemination of GK | Strongly agree to dissemination of GK | Strongly agree to dissemination of GK Strongly... | 761 761 | #a7b87e | 276 | 276 | 0.003623 | 0.003623 | 0.000013 |
| 13 | 761 | 765 | 1 | Strongly agree to dissemination of GK | Strongly agree to Policymaking | Strongly agree to dissemination of GK Strongly... | 761 765 | #8236bb | 276 | 278 | 0.003623 | 0.003597 | 0.000013 |
from pyvis.network import Network
from itertools import combinations
import networkx
import nxviz as nv
import matplotlib as mpl
mpl.style.use('classic')
df_graph = rnxn
df_graph['From'] = df_graph[3].map(str)+' '+ ((df_graph['p1p2']*100).round(2)).map(str)
df_graph['To'] = df_graph[4]
df_graph['Count'] = df_graph['counts']
colors=cls['colour']
weights = df_graph['counts']
G = networkx.from_pandas_edgelist(
df_graph, source="From", target="To", edge_attr="Count"
)
#################3333333
# dynamic edge sizes
scale=100 # Scaling the size of the edges by 3 degrees
d = dict(G.degree)
#Updating dict
d.update((x, scale*y) for x, y in d.items())
####
plt.figure(figsize=(15,15))
plt.rcParams['figure.facecolor'] = 'white'
graph_pos = networkx.spring_layout(G)
G = networkx.draw_networkx(G, pos = networkx.nx_pydot.graphviz_layout(G), edge_color=colors, node_color='blue',alpha=1,
width=weights*0.1, arrows= False, with_labels=True, font_size=10, font_family='sans-serif'
)
plt.tight_layout()
plt.savefig('network_3_4.png', dpi=300)
###### ALTERNATIVE METHOD, WITHOUT ZIGZAG - TOP PATHS
xor = pd.DataFrame(y).reset_index()
del xor['index']
del xor[0]
all_columns = list(xor.columns)
xor['count'] = 1
xor = xor.groupby(all_columns)['count'].sum().reset_index()
#xor = xor[xor['count'] > 1]
xor
nxor = xor[all_columns].copy()
for column in all_columns:
nxor[column] = nxor[column].map(str)
nxor[column] = nxor[column].map(inv_map)
nxor
one_xor = pd.concat([xor, nxor], axis=1)
one_xor.sort_values(['count'], ascending=False, inplace=True)
#one_xor[one_xor['count'] > 1]
one_xor
| 1 | 2 | count | 1 | 2 | |
|---|---|---|---|---|---|
| 7 | 760 | 766 | 227 | Agree to dissemination of GK | Agree to Policymaking |
| 11 | 761 | 765 | 196 | Strongly agree to dissemination of GK | Strongly agree to Policymaking |
| 6 | 760 | 765 | 70 | Agree to dissemination of GK | Strongly agree to Policymaking |
| 12 | 761 | 766 | 56 | Strongly agree to dissemination of GK | Agree to Policymaking |
| 0 | 759 | 764 | 33 | Strongly disagree to dissemination of GK | Strongly disagree to Policymaking |
| 17 | 762 | 767 | 23 | Neutral towards to dissemination of GK | Neutral towards to Policymaking |
| 8 | 760 | 767 | 22 | Agree to dissemination of GK | Neutral towards to Policymaking |
| 9 | 760 | 768 | 15 | Agree to dissemination of GK | Disagree to Policymaking |
| 13 | 761 | 767 | 15 | Strongly agree to dissemination of GK | Neutral towards to Policymaking |
| 16 | 762 | 766 | 15 | Neutral towards to dissemination of GK | Agree to Policymaking |
| 21 | 763 | 766 | 14 | Disagree to dissemination of GK | Agree to Policymaking |
| 23 | 763 | 768 | 12 | Disagree to dissemination of GK | Disagree to Policymaking |
| 4 | 759 | 768 | 6 | Strongly disagree to dissemination of GK | Disagree to Policymaking |
| 2 | 759 | 766 | 5 | Strongly disagree to dissemination of GK | Agree to Policymaking |
| 1 | 759 | 765 | 5 | Strongly disagree to dissemination of GK | Strongly agree to Policymaking |
| 3 | 759 | 767 | 4 | Strongly disagree to dissemination of GK | Neutral towards to Policymaking |
| 10 | 761 | 764 | 4 | Strongly agree to dissemination of GK | Strongly disagree to Policymaking |
| 15 | 762 | 765 | 4 | Neutral towards to dissemination of GK | Strongly agree to Policymaking |
| 18 | 762 | 768 | 4 | Neutral towards to dissemination of GK | Disagree to Policymaking |
| 19 | 763 | 764 | 4 | Disagree to dissemination of GK | Strongly disagree to Policymaking |
| 22 | 763 | 767 | 3 | Disagree to dissemination of GK | Neutral towards to Policymaking |
| 5 | 760 | 764 | 3 | Agree to dissemination of GK | Strongly disagree to Policymaking |
| 20 | 763 | 765 | 2 | Disagree to dissemination of GK | Strongly agree to Policymaking |
| 14 | 761 | 768 | 2 | Strongly agree to dissemination of GK | Disagree to Policymaking |
rnxn_34 = df_graph.copy()
xor_34 = one_xor.copy()
select= ['4', '5']
nndf = BNdf[BNdf['Group'].isin(select)]
#cps['Option'] = cps['Option']+' '+cps['Description']
sources = nndf[['id', 'Option']].copy()
len_options = len(nndf.Option.unique())
len_options
len_ids = len(nndf.id.unique()) +1
len_ids
ranges = list(range(len_ids, len_ids+len_options))
len(ranges) == len(nndf.Option.unique())
options = nndf.Option.unique()
options
# get categorical codes
categories = dict(zip(options,ranges))
categories
sources['codes'] = sources['Option'].map(categories)
xtt=pd.DataFrame()
xtt = sources[['Option', 'codes']].copy()
# get source codes and counts
sources['codes'] = sources['codes'].map(str)
counts = sources.groupby(["id"])["codes"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+counts['codes'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
counts['xcodes'] = nx.iloc[:,2]
gcounts = sources.groupby(["id"])["Option"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+gcounts['Option'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
gcounts['xoption'] = nx.iloc[:,2]
gcounts
lel = pd.merge(counts, gcounts, on='id')
del lel['codes']
del lel['Option']
lel
# writing operations
wo = []
for i in range(len(counts['xcodes'])) :
wo.append(pd.Series(counts.iloc[i, 2]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
# value counts df
vc = pd.DataFrame(wo)
# counts
cxounts = pd.concat([lel, vc], axis=1)
lex = cxounts.set_index(['id','xcodes', 'xoption']).stack().reset_index()
lex['counts'] = lex[0]
lex['codes'] = lex['level_3']
del lex[0]
del lex['level_3']
# paths
lex['path'] = """'""" + lex["id"].astype(str)+"',"+lex["xcodes"]
lex['label'] = """'""" + lex["id"].astype(str)+"',"+lex["xoption"]
lex['path'] = lex['path'].str.replace("""'""", '')
lex['label'] = lex['label'].str.replace("""'""", '')
lex.head(2)
lex["counts"] = lex["counts"].map(int)
## paths and sources
path_list = list(lex.path.unique())
label_list = list(lex.xoption.unique())
############################################## corrected code
def zigzag(seq):
"""Return two sequences with alternating elements from `seq`"""
seq_int = [list(map(int, x)) for x in seq]
x = []
y = []
for i in seq_int:
for j, k in zip(i, i[1:]):
x.append(j)
y.append(k)
return list(zip(x, y))
# get a path graph
y = []
for i in range(len(path_list)):
y.append(list(path_list[i].split(',')))
big_list = zigzag(y)
#### MOST COMMON PATH
c_path = pd.DataFrame(big_list)
c_path = c_path[c_path[0].isin(ranges)] #remove the participant id initials
c_path[2] = c_path[0]
c_path[0] = '1'
c_path
########################## edit here
tagged = c_path.groupby([1, 2])[0].agg(lambda x: """','""".join(x[x != ''])).reset_index()
xtagged= ("""'"""+tagged[0].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
xtagged['counts'] = [len(x.split(',')) for x in xtagged[0].tolist()]
ztagged = pd.concat([tagged, xtagged], axis=1)
ztagged
####
inv_map = {str(v): str(k) for k, v in categories.items()}
fif = ztagged[[1, 2, 0, 'counts']]
fif[1] = fif[1].map(str)
fif[3] = fif[1].map(inv_map)
fif[2] = fif[2].map(str)
fif[4] = fif[2].map(inv_map)
del fif[0]
fif['label'] = fif[3] + ' ' + fif[4]
fif[1] = fif[1].map(int)
fif[2] = fif[2].map(int)
fif
fif['connections'] = fif.iloc[:,0].astype(str)+" "+fif.iloc[:,1].astype(str)
cls = pd.DataFrame()
cls['connections'] = pd.DataFrame(fif['connections'].unique())
import random
# generate random colours
amount = len(fif['connections'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
print(colour)
['#ccd829', '#8045a3', '#5a05c9', '#ce7d3e', '#56f551', '#26e722', '#3c93a3', '#af8146', '#7666c3', '#811fed', '#8a45b6', '#035aa2', '#e82a28', '#66f804', '#3312aa', '#2e3c37', '#b41e47', '#34ff21', '#336d1e', '#1a14e7', '#ea3dc5', '#c3b538', '#86db2b', '#7069f4', '#2715b7', '#cd0cb2']
cls['colour'] = colour
fif = pd.merge(fif, cls, on='connections')
fif
| 1 | 2 | counts | 3 | 4 | label | connections | colour | |
|---|---|---|---|---|---|---|---|---|
| 0 | 755 | 761 | 1 | Strongly agree to Policymaking | Strongly agree to Revising and Updating | Strongly agree to Policymaking Strongly agree ... | 755 761 | #ccd829 |
| 1 | 757 | 762 | 1 | Neutral towards to Policymaking | Disagree to Revising and Updating | Neutral towards to Policymaking Disagree to Re... | 757 762 | #8045a3 |
| 2 | 759 | 754 | 33 | Strongly disagree to Revising and Updating | Strongly disagree to Policymaking | Strongly disagree to Revising and Updating Str... | 759 754 | #5a05c9 |
| 3 | 759 | 755 | 2 | Strongly disagree to Revising and Updating | Strongly agree to Policymaking | Strongly disagree to Revising and Updating Str... | 759 755 | #ce7d3e |
| 4 | 759 | 756 | 2 | Strongly disagree to Revising and Updating | Agree to Policymaking | Strongly disagree to Revising and Updating Agr... | 759 756 | #56f551 |
| 5 | 759 | 757 | 2 | Strongly disagree to Revising and Updating | Neutral towards to Policymaking | Strongly disagree to Revising and Updating Neu... | 759 757 | #26e722 |
| 6 | 759 | 758 | 1 | Strongly disagree to Revising and Updating | Disagree to Policymaking | Strongly disagree to Revising and Updating Dis... | 759 758 | #3c93a3 |
| 7 | 760 | 754 | 5 | Agree to Revising and Updating | Strongly disagree to Policymaking | Agree to Revising and Updating Strongly disagr... | 760 754 | #af8146 |
| 8 | 760 | 755 | 56 | Agree to Revising and Updating | Strongly agree to Policymaking | Agree to Revising and Updating Strongly agree ... | 760 755 | #7666c3 |
| 9 | 760 | 756 | 254 | Agree to Revising and Updating | Agree to Policymaking | Agree to Revising and Updating Agree to Policy... | 760 756 | #811fed |
| 10 | 760 | 757 | 24 | Agree to Revising and Updating | Neutral towards to Policymaking | Agree to Revising and Updating Neutral towards... | 760 757 | #8a45b6 |
| 11 | 760 | 758 | 13 | Agree to Revising and Updating | Disagree to Policymaking | Agree to Revising and Updating Disagree to Pol... | 760 758 | #035aa2 |
| 12 | 761 | 754 | 2 | Strongly agree to Revising and Updating | Strongly disagree to Policymaking | Strongly agree to Revising and Updating Strong... | 761 754 | #e82a28 |
| 13 | 761 | 755 | 200 | Strongly agree to Revising and Updating | Strongly agree to Policymaking | Strongly agree to Revising and Updating Strong... | 761 755 | #66f804 |
| 14 | 761 | 756 | 22 | Strongly agree to Revising and Updating | Agree to Policymaking | Strongly agree to Revising and Updating Agree ... | 761 756 | #3312aa |
| 15 | 761 | 757 | 6 | Strongly agree to Revising and Updating | Neutral towards to Policymaking | Strongly agree to Revising and Updating Neutra... | 761 757 | #2e3c37 |
| 16 | 761 | 758 | 1 | Strongly agree to Revising and Updating | Disagree to Policymaking | Strongly agree to Revising and Updating Disagr... | 761 758 | #b41e47 |
| 17 | 762 | 754 | 3 | Disagree to Revising and Updating | Strongly disagree to Policymaking | Disagree to Revising and Updating Strongly dis... | 762 754 | #34ff21 |
| 18 | 762 | 755 | 6 | Disagree to Revising and Updating | Strongly agree to Policymaking | Disagree to Revising and Updating Strongly agr... | 762 755 | #336d1e |
| 19 | 762 | 756 | 8 | Disagree to Revising and Updating | Agree to Policymaking | Disagree to Revising and Updating Agree to Pol... | 762 756 | #1a14e7 |
| 20 | 762 | 757 | 3 | Disagree to Revising and Updating | Neutral towards to Policymaking | Disagree to Revising and Updating Neutral towa... | 762 757 | #ea3dc5 |
| 21 | 762 | 758 | 17 | Disagree to Revising and Updating | Disagree to Policymaking | Disagree to Revising and Updating Disagree to ... | 762 758 | #c3b538 |
| 22 | 763 | 755 | 12 | Neutral towards to Revising and Updating | Strongly agree to Policymaking | Neutral towards to Revising and Updating Stron... | 763 755 | #86db2b |
| 23 | 763 | 756 | 30 | Neutral towards to Revising and Updating | Agree to Policymaking | Neutral towards to Revising and Updating Agree... | 763 756 | #7069f4 |
| 24 | 763 | 757 | 32 | Neutral towards to Revising and Updating | Neutral towards to Policymaking | Neutral towards to Revising and Updating Neutr... | 763 757 | #2715b7 |
| 25 | 763 | 758 | 7 | Neutral towards to Revising and Updating | Disagree to Policymaking | Neutral towards to Revising and Updating Disag... | 763 758 | #cd0cb2 |
### filter here for single counts
fif['counts'] = fif['counts'].map(int)
nfif = fif[fif['counts'] > 10]
### new plot
sources = list(nfif[1])
targets = list(nfif[2])
values = list(nfif['counts'])
labels = list(nfif['label'])
colours = list(nfif['colour'])
unique_list = nfif['label'].unique()
sources = sources
targets = targets
values = values
nodified = nodify(node_names=unique_list)
nodified
###
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 20,
thickness = 5,
line = dict(color = 'red', width = 1),
label = labels,
customdata = labels,
hovertemplate='Source has total value %{value}<extra></extra>',
color = 'blue',
),
link = dict(
source = sources, # indices correspond to labels, eg A1, A2, A2, B1, ...
target = targets,
value = values,
customdata = labels,
color = colours,
hovertemplate='Absolute count: %{value}'+
'<br />Option: %{customdata}<extra></extra>'
))])
go.Layout(title='Sankey plot',
#other options for the plot
hoverlabel=dict(font=dict(family='sans-serif', size=100)))
fig = fig.update_layout(margin=dict(t=100))
fig.write_html("/home/manu10/Downloads/iglas_work/sankey_4_5.html")
fig.show()
def nodify(node_names):
node_names = unique_list
# uniqe name begginings
ends = sorted(list(set([e[0] for e in node_names])))
# intervals
steps = 1/4
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
# x and y values in list form
x_values = [nodes_x[n[0]] for n in node_names]
y_values = [x*0.03 for x in range(1, len(x_values))]
return x_values, y_values
# map colours to categories
import random
# generate random colours
amount = len(npaths['name'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
print(colour)
['#d8da86', '#52d3ad', '#76ecc4', '#187949', '#4e3107', '#52482d', '#1be6cb', '#e44ed1', '#ab66e3', '#ddd5c5', '#0ff188', '#44dfc1', '#127456', '#c8b7f4', '#8db641', '#dcd384', '#e9283e', '#cd5ec2', '#dcf75b', '#241a06', '#05804e', '#fbbba5', '#b9412c', '#b76098', '#872ccf', '#040226', '#5cc063', '#547722', '#9f6052', '#e635fd', '#9a543d', '#3ccff2', '#1df726', '#7e5504', '#1b61b3', '#361e72', '#730086', '#cb9264', '#a8994f', '#670432', '#b6c262', '#a25d61', '#87b627', '#9ded27', '#7d0549', '#844ae5', '#16f4b5', '#25dc0d', '#87abde', '#9bceec', '#b4dcd8', '#3b5fd6', '#9db1d9', '#d0621a', '#075711', '#1aeb29', '#bc458d', '#e0ca81', '#25e713', '#f38702', '#8f755d', '#7b8e98', '#010ef7', '#8e870f', '#f425b3', '#1cb501', '#d4a2a8', '#87b63a', '#309eb5', '#6ec4e8', '#8b67c5', '#bbb036', '#90f5ba', '#dd529d', '#9e840c', '#230607', '#6ddefc', '#c63768', '#2fbd9a']
####### GET SOME SIGNIFICANT PATHS, options occuring together
nfif = fif[fif['counts'] > 1]
nfif
pax = pd.DataFrame(nndf).reset_index()
pax.id = 1
pax.drop('index', axis=1, inplace=True)
pax = pax.groupby('Option')['id'].sum().reset_index()
pax.columns = [3, 'id']
nxn = pd.merge(nfif, pax, on=3)
pax.columns = [4, 'idx']
rnxn = pd.merge(nxn, pax, on=4)
rnxn['p1'] = rnxn['counts']/rnxn['id']
rnxn['p2'] = rnxn['counts']/rnxn['idx']
rnxn['p1p2'] = rnxn['p1']*rnxn['p2']
#rnxn = rnxn[rnxn['p1p2'] >= .05]
rnxn.sort_values(['p1p2'], ascending=False, inplace=True)
rnxn
| 1 | 2 | counts | 3 | 4 | label | connections | colour | id | idx | p1 | p2 | p1p2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6 | 761 | 755 | 200 | Strongly agree to Revising and Updating | Strongly agree to Policymaking | Strongly agree to Revising and Updating Strong... | 761 755 | #66f804 | 231 | 278 | 0.865801 | 0.719424 | 0.622878 |
| 0 | 759 | 754 | 33 | Strongly disagree to Revising and Updating | Strongly disagree to Policymaking | Strongly disagree to Revising and Updating Str... | 759 754 | #5a05c9 | 41 | 45 | 0.804878 | 0.733333 | 0.590244 |
| 10 | 760 | 756 | 254 | Agree to Revising and Updating | Agree to Policymaking | Agree to Revising and Updating Agree to Policy... | 760 756 | #811fed | 353 | 318 | 0.719547 | 0.798742 | 0.574732 |
| 20 | 762 | 758 | 17 | Disagree to Revising and Updating | Disagree to Policymaking | Disagree to Revising and Updating Disagree to ... | 762 758 | #c3b538 | 38 | 39 | 0.447368 | 0.435897 | 0.195007 |
| 18 | 763 | 757 | 32 | Neutral towards to Revising and Updating | Neutral towards to Policymaking | Neutral towards to Revising and Updating Neutr... | 763 757 | #2715b7 | 83 | 68 | 0.385542 | 0.470588 | 0.181432 |
| 13 | 763 | 756 | 30 | Neutral towards to Revising and Updating | Agree to Policymaking | Neutral towards to Revising and Updating Agree... | 763 756 | #7069f4 | 83 | 318 | 0.361446 | 0.094340 | 0.034099 |
| 5 | 760 | 755 | 56 | Agree to Revising and Updating | Strongly agree to Policymaking | Agree to Revising and Updating Strongly agree ... | 760 755 | #7666c3 | 353 | 278 | 0.158640 | 0.201439 | 0.031956 |
| 15 | 760 | 757 | 24 | Agree to Revising and Updating | Neutral towards to Policymaking | Agree to Revising and Updating Neutral towards... | 760 757 | #8a45b6 | 353 | 68 | 0.067989 | 0.352941 | 0.023996 |
| 21 | 763 | 758 | 7 | Neutral towards to Revising and Updating | Disagree to Policymaking | Neutral towards to Revising and Updating Disag... | 763 758 | #cd0cb2 | 83 | 39 | 0.084337 | 0.179487 | 0.015137 |
| 19 | 760 | 758 | 13 | Agree to Revising and Updating | Disagree to Policymaking | Agree to Revising and Updating Disagree to Pol... | 760 758 | #035aa2 | 353 | 39 | 0.036827 | 0.333333 | 0.012276 |
| 11 | 761 | 756 | 22 | Strongly agree to Revising and Updating | Agree to Policymaking | Strongly agree to Revising and Updating Agree ... | 761 756 | #3312aa | 231 | 318 | 0.095238 | 0.069182 | 0.006589 |
| 8 | 763 | 755 | 12 | Neutral towards to Revising and Updating | Strongly agree to Policymaking | Neutral towards to Revising and Updating Stron... | 763 755 | #86db2b | 83 | 278 | 0.144578 | 0.043165 | 0.006241 |
| 12 | 762 | 756 | 8 | Disagree to Revising and Updating | Agree to Policymaking | Disagree to Revising and Updating Agree to Pol... | 762 756 | #1a14e7 | 38 | 318 | 0.210526 | 0.025157 | 0.005296 |
| 3 | 762 | 754 | 3 | Disagree to Revising and Updating | Strongly disagree to Policymaking | Disagree to Revising and Updating Strongly dis... | 762 754 | #34ff21 | 38 | 45 | 0.078947 | 0.066667 | 0.005263 |
| 17 | 762 | 757 | 3 | Disagree to Revising and Updating | Neutral towards to Policymaking | Disagree to Revising and Updating Neutral towa... | 762 757 | #ea3dc5 | 38 | 68 | 0.078947 | 0.044118 | 0.003483 |
| 7 | 762 | 755 | 6 | Disagree to Revising and Updating | Strongly agree to Policymaking | Disagree to Revising and Updating Strongly agr... | 762 755 | #336d1e | 38 | 278 | 0.157895 | 0.021583 | 0.003408 |
| 16 | 761 | 757 | 6 | Strongly agree to Revising and Updating | Neutral towards to Policymaking | Strongly agree to Revising and Updating Neutra... | 761 757 | #2e3c37 | 231 | 68 | 0.025974 | 0.088235 | 0.002292 |
| 1 | 760 | 754 | 5 | Agree to Revising and Updating | Strongly disagree to Policymaking | Agree to Revising and Updating Strongly disagr... | 760 754 | #af8146 | 353 | 45 | 0.014164 | 0.111111 | 0.001574 |
| 14 | 759 | 757 | 2 | Strongly disagree to Revising and Updating | Neutral towards to Policymaking | Strongly disagree to Revising and Updating Neu... | 759 757 | #26e722 | 41 | 68 | 0.048780 | 0.029412 | 0.001435 |
| 2 | 761 | 754 | 2 | Strongly agree to Revising and Updating | Strongly disagree to Policymaking | Strongly agree to Revising and Updating Strong... | 761 754 | #e82a28 | 231 | 45 | 0.008658 | 0.044444 | 0.000385 |
| 4 | 759 | 755 | 2 | Strongly disagree to Revising and Updating | Strongly agree to Policymaking | Strongly disagree to Revising and Updating Str... | 759 755 | #ce7d3e | 41 | 278 | 0.048780 | 0.007194 | 0.000351 |
| 9 | 759 | 756 | 2 | Strongly disagree to Revising and Updating | Agree to Policymaking | Strongly disagree to Revising and Updating Agr... | 759 756 | #56f551 | 41 | 318 | 0.048780 | 0.006289 | 0.000307 |
from pyvis.network import Network
from itertools import combinations
import networkx
import nxviz as nv
import matplotlib as mpl
mpl.style.use('classic')
df_graph = rnxn
df_graph['From'] = df_graph[3].map(str)+' '+ ((df_graph['p1p2']*100).round(2)).map(str)
df_graph['To'] = df_graph[4]
df_graph['Count'] = df_graph['counts']
colors=cls['colour']
weights = df_graph['counts']
G = networkx.from_pandas_edgelist(
df_graph, source="From", target="To", edge_attr="Count"
)
#################3333333
# dynamic edge sizes
scale=100 # Scaling the size of the edges by 3 degrees
d = dict(G.degree)
#Updating dict
d.update((x, scale*y) for x, y in d.items())
####
plt.figure(figsize=(15,15))
plt.rcParams['figure.facecolor'] = 'white'
graph_pos = networkx.spring_layout(G)
G = networkx.draw_networkx(G, pos = networkx.nx_pydot.graphviz_layout(G), edge_color=colors, node_color='blue',alpha=1,
width=weights*0.1, arrows= False, with_labels=True, font_size=10, font_family='sans-serif'
)
plt.tight_layout()
plt.savefig('network_4_5.png', dpi=300)
###### ALTERNATIVE METHOD, WITHOUT ZIGZAG - TOP PATHS
xor = pd.DataFrame(y).reset_index()
del xor['index']
del xor[0]
all_columns = list(xor.columns)
xor['count'] = 1
xor = xor.groupby(all_columns)['count'].sum().reset_index()
#xor = xor[xor['count'] > 1]
xor
nxor = xor[all_columns].copy()
for column in all_columns:
nxor[column] = nxor[column].map(str)
nxor[column] = nxor[column].map(inv_map)
nxor
one_xor = pd.concat([xor, nxor], axis=1)
one_xor.sort_values(['count'], ascending=False, inplace=True)
#one_xor[one_xor['count'] > 1]
one_xor
| 1 | 2 | count | 1 | 2 | |
|---|---|---|---|---|---|
| 10 | 756 | 760 | 254 | Agree to Policymaking | Agree to Revising and Updating |
| 6 | 755 | 761 | 200 | Strongly agree to Policymaking | Strongly agree to Revising and Updating |
| 5 | 755 | 760 | 56 | Strongly agree to Policymaking | Agree to Revising and Updating |
| 0 | 754 | 759 | 33 | Strongly disagree to Policymaking | Strongly disagree to Revising and Updating |
| 18 | 757 | 763 | 32 | Neutral towards to Policymaking | Neutral towards to Revising and Updating |
| 13 | 756 | 763 | 30 | Agree to Policymaking | Neutral towards to Revising and Updating |
| 15 | 757 | 760 | 24 | Neutral towards to Policymaking | Agree to Revising and Updating |
| 11 | 756 | 761 | 22 | Agree to Policymaking | Strongly agree to Revising and Updating |
| 22 | 758 | 762 | 17 | Disagree to Policymaking | Disagree to Revising and Updating |
| 20 | 758 | 760 | 13 | Disagree to Policymaking | Agree to Revising and Updating |
| 8 | 755 | 763 | 12 | Strongly agree to Policymaking | Neutral towards to Revising and Updating |
| 12 | 756 | 762 | 8 | Agree to Policymaking | Disagree to Revising and Updating |
| 23 | 758 | 763 | 7 | Disagree to Policymaking | Neutral towards to Revising and Updating |
| 7 | 755 | 762 | 6 | Strongly agree to Policymaking | Disagree to Revising and Updating |
| 16 | 757 | 761 | 6 | Neutral towards to Policymaking | Strongly agree to Revising and Updating |
| 1 | 754 | 760 | 5 | Strongly disagree to Policymaking | Agree to Revising and Updating |
| 17 | 757 | 762 | 3 | Neutral towards to Policymaking | Disagree to Revising and Updating |
| 3 | 754 | 762 | 3 | Strongly disagree to Policymaking | Disagree to Revising and Updating |
| 9 | 756 | 759 | 2 | Agree to Policymaking | Strongly disagree to Revising and Updating |
| 14 | 757 | 759 | 2 | Neutral towards to Policymaking | Strongly disagree to Revising and Updating |
| 4 | 755 | 759 | 2 | Strongly agree to Policymaking | Strongly disagree to Revising and Updating |
| 2 | 754 | 761 | 2 | Strongly disagree to Policymaking | Strongly agree to Revising and Updating |
| 19 | 758 | 759 | 1 | Disagree to Policymaking | Strongly disagree to Revising and Updating |
| 21 | 758 | 761 | 1 | Disagree to Policymaking | Strongly agree to Revising and Updating |
rnxn_45 = df_graph.copy()
xor_45 = one_xor.copy()
select= ['5', '3']
nndf = BNdf[BNdf['Group'].isin(select)]
#cps['Option'] = cps['Option']+' '+cps['Description']
sources = nndf[['id', 'Option']].copy()
len_options = len(nndf.Option.unique())
len_options
len_ids = len(nndf.id.unique()) +1
len_ids
ranges = list(range(len_ids, len_ids+len_options))
len(ranges) == len(nndf.Option.unique())
options = nndf.Option.unique()
options
# get categorical codes
categories = dict(zip(options,ranges))
categories
sources['codes'] = sources['Option'].map(categories)
xtt=pd.DataFrame()
xtt = sources[['Option', 'codes']].copy()
# get source codes and counts
sources['codes'] = sources['codes'].map(str)
counts = sources.groupby(["id"])["codes"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+counts['codes'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
counts['xcodes'] = nx.iloc[:,2]
gcounts = sources.groupby(["id"])["Option"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+gcounts['Option'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
gcounts['xoption'] = nx.iloc[:,2]
gcounts
lel = pd.merge(counts, gcounts, on='id')
del lel['codes']
del lel['Option']
lel
# writing operations
wo = []
for i in range(len(counts['xcodes'])) :
wo.append(pd.Series(counts.iloc[i, 2]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
# value counts df
vc = pd.DataFrame(wo)
# counts
cxounts = pd.concat([lel, vc], axis=1)
lex = cxounts.set_index(['id','xcodes', 'xoption']).stack().reset_index()
lex['counts'] = lex[0]
lex['codes'] = lex['level_3']
del lex[0]
del lex['level_3']
# paths
lex['path'] = """'""" + lex["id"].astype(str)+"',"+lex["xcodes"]
lex['label'] = """'""" + lex["id"].astype(str)+"',"+lex["xoption"]
lex['path'] = lex['path'].str.replace("""'""", '')
lex['label'] = lex['label'].str.replace("""'""", '')
lex.head(2)
lex["counts"] = lex["counts"].map(int)
## paths and sources
path_list = list(lex.path.unique())
label_list = list(lex.xoption.unique())
############################################## corrected code
def zigzag(seq):
"""Return two sequences with alternating elements from `seq`"""
seq_int = [list(map(int, x)) for x in seq]
x = []
y = []
for i in seq_int:
for j, k in zip(i, i[1:]):
x.append(j)
y.append(k)
return list(zip(x, y))
# get a path graph
y = []
for i in range(len(path_list)):
y.append(list(path_list[i].split(',')))
big_list = zigzag(y)
#### MOST COMMON PATH
c_path = pd.DataFrame(big_list)
c_path = c_path[c_path[0].isin(ranges)] #remove the participant id initials
c_path[2] = c_path[0]
c_path[0] = '1'
c_path
########################## edit here
tagged = c_path.groupby([1, 2])[0].agg(lambda x: """','""".join(x[x != ''])).reset_index()
xtagged= ("""'"""+tagged[0].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
xtagged['counts'] = [len(x.split(',')) for x in xtagged[0].tolist()]
ztagged = pd.concat([tagged, xtagged], axis=1)
ztagged
####
inv_map = {str(v): str(k) for k, v in categories.items()}
fif = ztagged[[1, 2, 0, 'counts']]
fif[1] = fif[1].map(str)
fif[3] = fif[1].map(inv_map)
fif[2] = fif[2].map(str)
fif[4] = fif[2].map(inv_map)
del fif[0]
fif['label'] = fif[3] + ' ' + fif[4]
fif[1] = fif[1].map(int)
fif[2] = fif[2].map(int)
fif['connections'] = fif.iloc[:,0].astype(str)+" "+fif.iloc[:,1].astype(str)
cls = pd.DataFrame()
cls['connections'] = pd.DataFrame(fif['connections'].unique())
import random
# generate random colours
amount = len(fif['connections'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
print(colour)
['#4627a6', '#3d92ec', '#a37b58', '#384642', '#db94a0', '#d50d08', '#d45499', '#d42b19', '#e52c73', '#2ff75c', '#c84758', '#67c229', '#f9434a', '#3e1a8b', '#94abf0', '#9cc961', '#a31ae2', '#bfdf83', '#aafe4d', '#fa7366', '#a84adf', '#6b875c', '#5bd6fc', '#ce1634', '#1e451a', '#511623', '#b72e1e', '#db63c2', '#c5a389']
cls['colour'] = colour
fif = pd.merge(fif, cls, on='connections')
fif
| 1 | 2 | counts | 3 | 4 | label | connections | colour | |
|---|---|---|---|---|---|---|---|---|
| 0 | 758 | 766 | 1 | Strongly disagree to dissemination of GK | Disagree to Revising and Updating | Strongly disagree to dissemination of GK Disag... | 758 766 | #4627a6 |
| 1 | 759 | 764 | 1 | Agree to dissemination of GK | Agree to Revising and Updating | Agree to dissemination of GK Agree to Revising... | 759 764 | #3d92ec |
| 2 | 760 | 761 | 1 | Strongly agree to dissemination of GK | Neutral towards to dissemination of GK | Strongly agree to dissemination of GK Neutral ... | 760 761 | #a37b58 |
| 3 | 760 | 762 | 1 | Strongly agree to dissemination of GK | Disagree to dissemination of GK | Strongly agree to dissemination of GK Disagree... | 760 762 | #384642 |
| 4 | 760 | 765 | 1 | Strongly agree to dissemination of GK | Strongly agree to Revising and Updating | Strongly agree to dissemination of GK Strongly... | 760 765 | #db94a0 |
| 5 | 763 | 758 | 32 | Strongly disagree to Revising and Updating | Strongly disagree to dissemination of GK | Strongly disagree to Revising and Updating Str... | 763 758 | #d50d08 |
| 6 | 763 | 760 | 5 | Strongly disagree to Revising and Updating | Strongly agree to dissemination of GK | Strongly disagree to Revising and Updating Str... | 763 760 | #d45499 |
| 7 | 763 | 761 | 2 | Strongly disagree to Revising and Updating | Neutral towards to dissemination of GK | Strongly disagree to Revising and Updating Neu... | 763 761 | #d42b19 |
| 8 | 763 | 762 | 2 | Strongly disagree to Revising and Updating | Disagree to dissemination of GK | Strongly disagree to Revising and Updating Dis... | 763 762 | #e52c73 |
| 9 | 764 | 758 | 9 | Agree to Revising and Updating | Strongly disagree to dissemination of GK | Agree to Revising and Updating Strongly disagr... | 764 758 | #2ff75c |
| 10 | 764 | 759 | 223 | Agree to Revising and Updating | Agree to dissemination of GK | Agree to Revising and Updating Agree to dissem... | 764 759 | #c84758 |
| 11 | 764 | 760 | 89 | Agree to Revising and Updating | Strongly agree to dissemination of GK | Agree to Revising and Updating Strongly agree ... | 764 760 | #67c229 |
| 12 | 764 | 761 | 17 | Agree to Revising and Updating | Neutral towards to dissemination of GK | Agree to Revising and Updating Neutral towards... | 764 761 | #f9434a |
| 13 | 764 | 762 | 13 | Agree to Revising and Updating | Disagree to dissemination of GK | Agree to Revising and Updating Disagree to dis... | 764 762 | #3e1a8b |
| 14 | 765 | 758 | 5 | Strongly agree to Revising and Updating | Strongly disagree to dissemination of GK | Strongly agree to Revising and Updating Strong... | 765 758 | #94abf0 |
| 15 | 765 | 759 | 61 | Strongly agree to Revising and Updating | Agree to dissemination of GK | Strongly agree to Revising and Updating Agree ... | 765 759 | #9cc961 |
| 16 | 765 | 760 | 161 | Strongly agree to Revising and Updating | Strongly agree to dissemination of GK | Strongly agree to Revising and Updating Strong... | 765 760 | #a31ae2 |
| 17 | 765 | 761 | 2 | Strongly agree to Revising and Updating | Neutral towards to dissemination of GK | Strongly agree to Revising and Updating Neutra... | 765 761 | #bfdf83 |
| 18 | 765 | 762 | 2 | Strongly agree to Revising and Updating | Disagree to dissemination of GK | Strongly agree to Revising and Updating Disagr... | 765 762 | #aafe4d |
| 19 | 766 | 758 | 3 | Disagree to Revising and Updating | Strongly disagree to dissemination of GK | Disagree to Revising and Updating Strongly dis... | 766 758 | #fa7366 |
| 20 | 766 | 759 | 11 | Disagree to Revising and Updating | Agree to dissemination of GK | Disagree to Revising and Updating Agree to dis... | 766 759 | #a84adf |
| 21 | 766 | 760 | 7 | Disagree to Revising and Updating | Strongly agree to dissemination of GK | Disagree to Revising and Updating Strongly agr... | 766 760 | #6b875c |
| 22 | 766 | 761 | 2 | Disagree to Revising and Updating | Neutral towards to dissemination of GK | Disagree to Revising and Updating Neutral towa... | 766 761 | #5bd6fc |
| 23 | 766 | 762 | 14 | Disagree to Revising and Updating | Disagree to dissemination of GK | Disagree to Revising and Updating Disagree to ... | 766 762 | #ce1634 |
| 24 | 767 | 758 | 4 | Neutral towards to Revising and Updating | Strongly disagree to dissemination of GK | Neutral towards to Revising and Updating Stron... | 767 758 | #1e451a |
| 25 | 767 | 759 | 40 | Neutral towards to Revising and Updating | Agree to dissemination of GK | Neutral towards to Revising and Updating Agree... | 767 759 | #511623 |
| 26 | 767 | 760 | 11 | Neutral towards to Revising and Updating | Strongly agree to dissemination of GK | Neutral towards to Revising and Updating Stron... | 767 760 | #b72e1e |
| 27 | 767 | 761 | 24 | Neutral towards to Revising and Updating | Neutral towards to dissemination of GK | Neutral towards to Revising and Updating Neutr... | 767 761 | #db63c2 |
| 28 | 767 | 762 | 4 | Neutral towards to Revising and Updating | Disagree to dissemination of GK | Neutral towards to Revising and Updating Disag... | 767 762 | #c5a389 |
### filter here for single counts
fif['counts'] = fif['counts'].map(int)
nfif = fif[fif['counts'] > 10]
### new plot
sources = list(nfif[1])
targets = list(nfif[2])
values = list(nfif['counts'])
labels = list(nfif['label'])
colours = list(nfif['colour'])
unique_list = nfif['label'].unique()
sources = sources
targets = targets
values = values
nodified = nodify(node_names=unique_list)
nodified
###
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 20,
thickness = 5,
line = dict(color = 'red', width = 1),
label = labels,
customdata = labels,
hovertemplate='Source has total value %{value}<extra></extra>',
color = 'blue',
),
link = dict(
source = sources, # indices correspond to labels, eg A1, A2, A2, B1, ...
target = targets,
value = values,
customdata = labels,
color = colours,
hovertemplate='Absolute count: %{value}'+
'<br />Option: %{customdata}<extra></extra>'
))])
go.Layout(title='Sankey plot',
#other options for the plot
hoverlabel=dict(font=dict(family='sans-serif', size=100)))
fig = fig.update_layout(margin=dict(t=100))
fig.write_html("/home/manu10/Downloads/iglas_work/sankey_5_3.html")
fig.show()
def nodify(node_names):
node_names = unique_list
# uniqe name begginings
ends = sorted(list(set([e[0] for e in node_names])))
# intervals
steps = 1/4
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
# x and y values in list form
x_values = [nodes_x[n[0]] for n in node_names]
y_values = [x*0.03 for x in range(1, len(x_values))]
return x_values, y_values
# map colours to categories
import random
# generate random colours
amount = len(npaths['name'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
print(colour)
['#e11868', '#89a8dc', '#2a1e6c', '#aeb62c', '#3a9f7c', '#c186b0', '#9d7b62', '#bd9092', '#f1fcec', '#4d31d5', '#b562ff', '#0688fa', '#0e7cb4', '#1c918e', '#6a574d', '#100fa9', '#1cb909', '#bdfc8e', '#bbe4b0', '#c873ef', '#b61b7a', '#8a6d0e', '#3ae680', '#3a4bcc', '#795e17', '#da280b', '#628287', '#3a498e', '#4ceb99', '#6c0578', '#6d6b5c', '#4274e5', '#62de0e', '#6afef7', '#79028c', '#c1c4b5', '#df7af4', '#9e0e16', '#333be3', '#053b3b', '#a8bce1', '#e4adef', '#065eb9', '#2455b4', '#68815b', '#3fcf84', '#142fbb', '#b341e5', '#92a4f6', '#d45b0b', '#3cec4f', '#8389a4', '#083aaf', '#3c9a2f', '#4f071b', '#0c9399', '#dc771f', '#6e8fe0', '#5e2794', '#f3b95f', '#c94587', '#2ce609', '#856b70', '#f94375', '#85ecd9', '#c3e150', '#2b2e10', '#cb5bca', '#d559cb', '#cfcb01', '#2a0516', '#62e0b2', '#0f1566', '#b8e03a', '#7179f7', '#051a01', '#0f6d45', '#0ce4e8', '#9a3584']
####### GET SOME SIGNIFICANT PATHS, options occuring together
nfif = fif[fif['counts'] > 1]
nfif
pax = pd.DataFrame(nndf).reset_index()
pax.id = 1
pax.drop('index', axis=1, inplace=True)
pax = pax.groupby('Option')['id'].sum().reset_index()
pax.columns = [3, 'id']
nxn = pd.merge(nfif, pax, on=3)
pax.columns = [4, 'idx']
rnxn = pd.merge(nxn, pax, on=4)
rnxn['p1'] = rnxn['counts']/rnxn['id']
rnxn['p2'] = rnxn['counts']/rnxn['idx']
rnxn['p1p2'] = rnxn['p1']*rnxn['p2']
#rnxn = rnxn[rnxn['p1p2'] >= .05]
rnxn.sort_values(['p1p2'], ascending=False, inplace=True)
rnxn
| 1 | 2 | counts | 3 | 4 | label | connections | colour | id | idx | p1 | p2 | p1p2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 763 | 758 | 32 | Strongly disagree to Revising and Updating | Strongly disagree to dissemination of GK | Strongly disagree to Revising and Updating Str... | 763 758 | #d50d08 | 41 | 56 | 0.780488 | 0.571429 | 0.445993 |
| 20 | 764 | 759 | 223 | Agree to Revising and Updating | Agree to dissemination of GK | Agree to Revising and Updating Agree to dissem... | 764 759 | #c84758 | 353 | 339 | 0.631728 | 0.657817 | 0.415562 |
| 7 | 765 | 760 | 161 | Strongly agree to Revising and Updating | Strongly agree to dissemination of GK | Strongly agree to Revising and Updating Strong... | 765 760 | #a31ae2 | 231 | 276 | 0.696970 | 0.583333 | 0.406566 |
| 18 | 766 | 762 | 14 | Disagree to Revising and Updating | Disagree to dissemination of GK | Disagree to Revising and Updating Disagree to ... | 766 762 | #ce1634 | 38 | 35 | 0.368421 | 0.400000 | 0.147368 |
| 14 | 767 | 761 | 24 | Neutral towards to Revising and Updating | Neutral towards to dissemination of GK | Neutral towards to Revising and Updating Neutr... | 767 761 | #db63c2 | 83 | 48 | 0.289157 | 0.500000 | 0.144578 |
| 6 | 764 | 760 | 89 | Agree to Revising and Updating | Strongly agree to dissemination of GK | Agree to Revising and Updating Strongly agree ... | 764 760 | #67c229 | 353 | 276 | 0.252125 | 0.322464 | 0.081301 |
| 23 | 767 | 759 | 40 | Neutral towards to Revising and Updating | Agree to dissemination of GK | Neutral towards to Revising and Updating Agree... | 767 759 | #511623 | 83 | 339 | 0.481928 | 0.117994 | 0.056865 |
| 21 | 765 | 759 | 61 | Strongly agree to Revising and Updating | Agree to dissemination of GK | Strongly agree to Revising and Updating Agree ... | 765 759 | #9cc961 | 231 | 339 | 0.264069 | 0.179941 | 0.047517 |
| 11 | 764 | 761 | 17 | Agree to Revising and Updating | Neutral towards to dissemination of GK | Agree to Revising and Updating Neutral towards... | 764 761 | #f9434a | 353 | 48 | 0.048159 | 0.354167 | 0.017056 |
| 16 | 764 | 762 | 13 | Agree to Revising and Updating | Disagree to dissemination of GK | Agree to Revising and Updating Disagree to dis... | 764 762 | #3e1a8b | 353 | 35 | 0.036827 | 0.371429 | 0.013679 |
| 22 | 766 | 759 | 11 | Disagree to Revising and Updating | Agree to dissemination of GK | Disagree to Revising and Updating Agree to dis... | 766 759 | #a84adf | 38 | 339 | 0.289474 | 0.032448 | 0.009393 |
| 19 | 767 | 762 | 4 | Neutral towards to Revising and Updating | Disagree to dissemination of GK | Neutral towards to Revising and Updating Disag... | 767 762 | #c5a389 | 83 | 35 | 0.048193 | 0.114286 | 0.005508 |
| 9 | 767 | 760 | 11 | Neutral towards to Revising and Updating | Strongly agree to dissemination of GK | Neutral towards to Revising and Updating Stron... | 767 760 | #b72e1e | 83 | 276 | 0.132530 | 0.039855 | 0.005282 |
| 8 | 766 | 760 | 7 | Disagree to Revising and Updating | Strongly agree to dissemination of GK | Disagree to Revising and Updating Strongly agr... | 766 760 | #6b875c | 38 | 276 | 0.184211 | 0.025362 | 0.004672 |
| 3 | 766 | 758 | 3 | Disagree to Revising and Updating | Strongly disagree to dissemination of GK | Disagree to Revising and Updating Strongly dis... | 766 758 | #fa7366 | 38 | 56 | 0.078947 | 0.053571 | 0.004229 |
| 1 | 764 | 758 | 9 | Agree to Revising and Updating | Strongly disagree to dissemination of GK | Agree to Revising and Updating Strongly disagr... | 764 758 | #2ff75c | 353 | 56 | 0.025496 | 0.160714 | 0.004098 |
| 4 | 767 | 758 | 4 | Neutral towards to Revising and Updating | Strongly disagree to dissemination of GK | Neutral towards to Revising and Updating Stron... | 767 758 | #1e451a | 83 | 56 | 0.048193 | 0.071429 | 0.003442 |
| 15 | 763 | 762 | 2 | Strongly disagree to Revising and Updating | Disagree to dissemination of GK | Strongly disagree to Revising and Updating Dis... | 763 762 | #e52c73 | 41 | 35 | 0.048780 | 0.057143 | 0.002787 |
| 5 | 763 | 760 | 5 | Strongly disagree to Revising and Updating | Strongly agree to dissemination of GK | Strongly disagree to Revising and Updating Str... | 763 760 | #d45499 | 41 | 276 | 0.121951 | 0.018116 | 0.002209 |
| 13 | 766 | 761 | 2 | Disagree to Revising and Updating | Neutral towards to dissemination of GK | Disagree to Revising and Updating Neutral towa... | 766 761 | #5bd6fc | 38 | 48 | 0.052632 | 0.041667 | 0.002193 |
| 10 | 763 | 761 | 2 | Strongly disagree to Revising and Updating | Neutral towards to dissemination of GK | Strongly disagree to Revising and Updating Neu... | 763 761 | #d42b19 | 41 | 48 | 0.048780 | 0.041667 | 0.002033 |
| 2 | 765 | 758 | 5 | Strongly agree to Revising and Updating | Strongly disagree to dissemination of GK | Strongly agree to Revising and Updating Strong... | 765 758 | #94abf0 | 231 | 56 | 0.021645 | 0.089286 | 0.001933 |
| 17 | 765 | 762 | 2 | Strongly agree to Revising and Updating | Disagree to dissemination of GK | Strongly agree to Revising and Updating Disagr... | 765 762 | #aafe4d | 231 | 35 | 0.008658 | 0.057143 | 0.000495 |
| 12 | 765 | 761 | 2 | Strongly agree to Revising and Updating | Neutral towards to dissemination of GK | Strongly agree to Revising and Updating Neutra... | 765 761 | #bfdf83 | 231 | 48 | 0.008658 | 0.041667 | 0.000361 |
from pyvis.network import Network
from itertools import combinations
import networkx
import nxviz as nv
import matplotlib as mpl
mpl.style.use('classic')
df_graph = rnxn
df_graph['From'] = df_graph[3].map(str)+' '+ ((df_graph['p1p2']*100).round(2)).map(str)
df_graph['To'] = df_graph[4]
df_graph['Count'] = df_graph['counts']
colors=cls['colour']
weights = df_graph['counts']
G = networkx.from_pandas_edgelist(
df_graph, source="From", target="To", edge_attr="Count"
)
#################3333333
# dynamic edge sizes
scale=100 # Scaling the size of the edges by 3 degrees
d = dict(G.degree)
#Updating dict
d.update((x, scale*y) for x, y in d.items())
####
plt.figure(figsize=(15,15))
plt.rcParams['figure.facecolor'] = 'white'
graph_pos = networkx.spring_layout(G)
G = networkx.draw_networkx(G, pos = networkx.nx_pydot.graphviz_layout(G), edge_color=colors, node_color='blue',alpha=1,
width=weights*0.1, arrows= False, with_labels=True, font_size=10, font_family='sans-serif'
)
plt.tight_layout()
plt.savefig('network_5_3.png', dpi=300)
###### ALTERNATIVE METHOD, WITHOUT ZIGZAG - TOP PATHS
xor = pd.DataFrame(y).reset_index()
del xor['index']
del xor[0]
all_columns = list(xor.columns)
xor['count'] = 1
xor = xor.groupby(all_columns)['count'].sum().reset_index()
#xor = xor[xor['count'] > 1]
xor
nxor = xor[all_columns].copy()
for column in all_columns:
nxor[column] = nxor[column].map(str)
nxor[column] = nxor[column].map(inv_map)
nxor
one_xor = pd.concat([xor, nxor], axis=1)
one_xor.sort_values(['count'], ascending=False, inplace=True)
#one_xor[one_xor['count'] > 1]
one_xor
| 1 | 2 | count | 1 | 2 | |
|---|---|---|---|---|---|
| 5 | 759 | 764 | 223 | Agree to dissemination of GK | Agree to Revising and Updating |
| 11 | 760 | 765 | 161 | Strongly agree to dissemination of GK | Strongly agree to Revising and Updating |
| 10 | 760 | 764 | 89 | Strongly agree to dissemination of GK | Agree to Revising and Updating |
| 6 | 759 | 765 | 61 | Agree to dissemination of GK | Strongly agree to Revising and Updating |
| 8 | 759 | 767 | 40 | Agree to dissemination of GK | Neutral towards to Revising and Updating |
| 0 | 758 | 763 | 32 | Strongly disagree to dissemination of GK | Strongly disagree to Revising and Updating |
| 18 | 761 | 767 | 24 | Neutral towards to dissemination of GK | Neutral towards to Revising and Updating |
| 15 | 761 | 764 | 17 | Neutral towards to dissemination of GK | Agree to Revising and Updating |
| 22 | 762 | 766 | 14 | Disagree to dissemination of GK | Disagree to Revising and Updating |
| 20 | 762 | 764 | 13 | Disagree to dissemination of GK | Agree to Revising and Updating |
| 13 | 760 | 767 | 11 | Strongly agree to dissemination of GK | Neutral towards to Revising and Updating |
| 7 | 759 | 766 | 11 | Agree to dissemination of GK | Disagree to Revising and Updating |
| 1 | 758 | 764 | 9 | Strongly disagree to dissemination of GK | Agree to Revising and Updating |
| 12 | 760 | 766 | 7 | Strongly agree to dissemination of GK | Disagree to Revising and Updating |
| 9 | 760 | 763 | 5 | Strongly agree to dissemination of GK | Strongly disagree to Revising and Updating |
| 2 | 758 | 765 | 5 | Strongly disagree to dissemination of GK | Strongly agree to Revising and Updating |
| 4 | 758 | 767 | 4 | Strongly disagree to dissemination of GK | Neutral towards to Revising and Updating |
| 23 | 762 | 767 | 4 | Disagree to dissemination of GK | Neutral towards to Revising and Updating |
| 3 | 758 | 766 | 3 | Strongly disagree to dissemination of GK | Disagree to Revising and Updating |
| 14 | 761 | 763 | 2 | Neutral towards to dissemination of GK | Strongly disagree to Revising and Updating |
| 16 | 761 | 765 | 2 | Neutral towards to dissemination of GK | Strongly agree to Revising and Updating |
| 17 | 761 | 766 | 2 | Neutral towards to dissemination of GK | Disagree to Revising and Updating |
| 19 | 762 | 763 | 2 | Disagree to dissemination of GK | Strongly disagree to Revising and Updating |
| 21 | 762 | 765 | 2 | Disagree to dissemination of GK | Strongly agree to Revising and Updating |
rnxn_53 = df_graph.copy()
xor_53 = one_xor.copy()
rnxn_combined = pd.concat([rnxn_34, rnxn_45, rnxn_53], axis=0)
rnxn_combined.sort_values('p1p2', ascending=False, inplace=True)
rnxn_combined
| 1 | 2 | counts | 3 | 4 | label | connections | colour | id | idx | p1 | p2 | p1p2 | From | To | Count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6 | 761 | 755 | 200 | Strongly agree to Revising and Updating | Strongly agree to Policymaking | Strongly agree to Revising and Updating Strong... | 761 755 | #66f804 | 231 | 278 | 0.865801 | 0.719424 | 0.622878 | Strongly agree to Revising and Updating 62.29 | Strongly agree to Policymaking | 200 |
| 0 | 759 | 754 | 33 | Strongly disagree to Revising and Updating | Strongly disagree to Policymaking | Strongly disagree to Revising and Updating Str... | 759 754 | #5a05c9 | 41 | 45 | 0.804878 | 0.733333 | 0.590244 | Strongly disagree to Revising and Updating 59.02 | Strongly disagree to Policymaking | 33 |
| 10 | 760 | 756 | 254 | Agree to Revising and Updating | Agree to Policymaking | Agree to Revising and Updating Agree to Policy... | 760 756 | #811fed | 353 | 318 | 0.719547 | 0.798742 | 0.574732 | Agree to Revising and Updating 57.47 | Agree to Policymaking | 254 |
| 4 | 765 | 761 | 196 | Strongly agree to Policymaking | Strongly agree to dissemination of GK | Strongly agree to Policymaking Strongly agree ... | 765 761 | #c52ce4 | 278 | 276 | 0.705036 | 0.710145 | 0.500678 | Strongly agree to Policymaking 50.07 | Strongly agree to dissemination of GK | 196 |
| 22 | 766 | 760 | 227 | Agree to Policymaking | Agree to dissemination of GK | Agree to Policymaking Agree to dissemination o... | 766 760 | #f6a227 | 318 | 339 | 0.713836 | 0.669617 | 0.477997 | Agree to Policymaking 47.8 | Agree to dissemination of GK | 227 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 8 | 761 | 762 | 1 | Strongly agree to dissemination of GK | Neutral towards to dissemination of GK | Strongly agree to dissemination of GK Neutral ... | 761 762 | #f47556 | 276 | 48 | 0.003623 | 0.020833 | 0.000075 | Strongly agree to dissemination of GK 0.01 | Neutral towards to dissemination of GK | 1 |
| 1 | 760 | 764 | 1 | Agree to dissemination of GK | Strongly disagree to Policymaking | Agree to dissemination of GK Strongly disagree... | 760 764 | #7d8b1e | 339 | 45 | 0.002950 | 0.022222 | 0.000066 | Agree to dissemination of GK 0.01 | Strongly disagree to Policymaking | 1 |
| 0 | 759 | 766 | 1 | Strongly disagree to dissemination of GK | Agree to Policymaking | Strongly disagree to dissemination of GK Agree... | 759 766 | #54eb4e | 56 | 318 | 0.017857 | 0.003145 | 0.000056 | Strongly disagree to dissemination of GK 0.01 | Agree to Policymaking | 1 |
| 2 | 761 | 761 | 1 | Strongly agree to dissemination of GK | Strongly agree to dissemination of GK | Strongly agree to dissemination of GK Strongly... | 761 761 | #a7b87e | 276 | 276 | 0.003623 | 0.003623 | 0.000013 | Strongly agree to dissemination of GK 0.0 | Strongly agree to dissemination of GK | 1 |
| 13 | 761 | 765 | 1 | Strongly agree to dissemination of GK | Strongly agree to Policymaking | Strongly agree to dissemination of GK Strongly... | 761 765 | #8236bb | 276 | 278 | 0.003623 | 0.003597 | 0.000013 | Strongly agree to dissemination of GK 0.0 | Strongly agree to Policymaking | 1 |
76 rows × 16 columns
xor_combined = pd.concat([xor_34, xor_45, xor_53], axis=0)
xor_combined
| 1 | 2 | count | 1 | 2 | |
|---|---|---|---|---|---|
| 7 | 760 | 766 | 227 | Agree to dissemination of GK | Agree to Policymaking |
| 11 | 761 | 765 | 196 | Strongly agree to dissemination of GK | Strongly agree to Policymaking |
| 6 | 760 | 765 | 70 | Agree to dissemination of GK | Strongly agree to Policymaking |
| 12 | 761 | 766 | 56 | Strongly agree to dissemination of GK | Agree to Policymaking |
| 0 | 759 | 764 | 33 | Strongly disagree to dissemination of GK | Strongly disagree to Policymaking |
| ... | ... | ... | ... | ... | ... |
| 14 | 761 | 763 | 2 | Neutral towards to dissemination of GK | Strongly disagree to Revising and Updating |
| 16 | 761 | 765 | 2 | Neutral towards to dissemination of GK | Strongly agree to Revising and Updating |
| 17 | 761 | 766 | 2 | Neutral towards to dissemination of GK | Disagree to Revising and Updating |
| 19 | 762 | 763 | 2 | Disagree to dissemination of GK | Strongly disagree to Revising and Updating |
| 21 | 762 | 765 | 2 | Disagree to dissemination of GK | Strongly agree to Revising and Updating |
72 rows × 5 columns
rnxn_triple.sort_values('p1p2', ascending=False, inplace=True)
rnxn_triple
| 1 | 2 | counts | 3 | 4 | label | connections | colour | id | idx | p1 | p2 | p1p2 | From | To | Count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 30 | 773 | 767 | 200 | Strongly agree to Revising and Updating | Strongly agree to Policymaking | Strongly agree to Revising and Updating Strong... | 773 767 | #eebc32 | 231 | 278 | 0.865801 | 0.719424 | 0.622878 | Strongly agree to Revising and Updating 62.29 | Strongly agree to Policymaking | 200 |
| 24 | 771 | 766 | 33 | Strongly disagree to Revising and Updating | Strongly disagree to Policymaking | Strongly disagree to Revising and Updating Str... | 771 766 | #bfa2fe | 41 | 45 | 0.804878 | 0.733333 | 0.590244 | Strongly disagree to Revising and Updating 59.02 | Strongly disagree to Policymaking | 33 |
| 34 | 772 | 768 | 254 | Agree to Revising and Updating | Agree to Policymaking | Agree to Revising and Updating Agree to Policy... | 772 768 | #c7d626 | 353 | 318 | 0.719547 | 0.798742 | 0.574732 | Agree to Revising and Updating 57.47 | Agree to Policymaking | 254 |
| 11 | 767 | 763 | 196 | Strongly agree to Policymaking | Strongly agree to dissemination of GK | Strongly agree to Policymaking Strongly agree ... | 767 763 | #da3ab2 | 278 | 276 | 0.705036 | 0.710145 | 0.500678 | Strongly agree to Policymaking 50.07 | Strongly agree to dissemination of GK | 196 |
| 7 | 768 | 762 | 227 | Agree to Policymaking | Agree to dissemination of GK | Agree to Policymaking Agree to dissemination o... | 768 762 | #cff8d0 | 318 | 339 | 0.713836 | 0.669617 | 0.477997 | Agree to Policymaking 47.8 | Agree to dissemination of GK | 227 |
| 0 | 766 | 761 | 33 | Strongly disagree to Policymaking | Strongly disagree to dissemination of GK | Strongly disagree to Policymaking Strongly dis... | 766 761 | #fa3e0d | 45 | 56 | 0.733333 | 0.589286 | 0.432143 | Strongly disagree to Policymaking 43.21 | Strongly disagree to dissemination of GK | 33 |
| 44 | 774 | 770 | 17 | Disagree to Revising and Updating | Disagree to Policymaking | Disagree to Revising and Updating Disagree to ... | 774 770 | #ba1e2b | 38 | 39 | 0.447368 | 0.435897 | 0.195007 | Disagree to Revising and Updating 19.5 | Disagree to Policymaking | 17 |
| 42 | 775 | 769 | 32 | Neutral towards to Revising and Updating | Neutral towards to Policymaking | Neutral towards to Revising and Updating Neutr... | 775 769 | #5f6053 | 83 | 68 | 0.385542 | 0.470588 | 0.181432 | Neutral towards to Revising and Updating 18.14 | Neutral towards to Policymaking | 32 |
| 22 | 769 | 764 | 23 | Neutral towards to Policymaking | Neutral towards to dissemination of GK | Neutral towards to Policymaking Neutral toward... | 769 764 | #313f23 | 68 | 48 | 0.338235 | 0.479167 | 0.162071 | Neutral towards to Policymaking 16.21 | Neutral towards to dissemination of GK | 23 |
| 19 | 770 | 765 | 12 | Disagree to Policymaking | Disagree to dissemination of GK | Disagree to Policymaking Disagree to dissemina... | 770 765 | #17e257 | 39 | 35 | 0.307692 | 0.342857 | 0.105495 | Disagree to Policymaking 10.55 | Disagree to dissemination of GK | 12 |
| 6 | 767 | 762 | 70 | Strongly agree to Policymaking | Agree to dissemination of GK | Strongly agree to Policymaking Agree to dissem... | 767 762 | #65d117 | 278 | 339 | 0.251799 | 0.206490 | 0.051994 | Strongly agree to Policymaking 5.2 | Agree to dissemination of GK | 70 |
| 12 | 768 | 763 | 56 | Agree to Policymaking | Strongly agree to dissemination of GK | Agree to Policymaking Strongly agree to dissem... | 768 763 | #62011b | 318 | 276 | 0.176101 | 0.202899 | 0.035731 | Agree to Policymaking 3.57 | Strongly agree to dissemination of GK | 56 |
| 37 | 775 | 768 | 30 | Neutral towards to Revising and Updating | Agree to Policymaking | Neutral towards to Revising and Updating Agree... | 775 768 | #5d40d3 | 83 | 318 | 0.361446 | 0.094340 | 0.034099 | Neutral towards to Revising and Updating 3.41 | Agree to Policymaking | 30 |
| 29 | 772 | 767 | 56 | Agree to Revising and Updating | Strongly agree to Policymaking | Agree to Revising and Updating Strongly agree ... | 772 767 | #7fbcbb | 353 | 278 | 0.158640 | 0.201439 | 0.031956 | Agree to Revising and Updating 3.2 | Strongly agree to Policymaking | 56 |
| 39 | 772 | 769 | 24 | Agree to Revising and Updating | Neutral towards to Policymaking | Agree to Revising and Updating Neutral towards... | 772 769 | #eb968b | 353 | 68 | 0.067989 | 0.352941 | 0.023996 | Agree to Revising and Updating 2.4 | Neutral towards to Policymaking | 24 |
| 8 | 769 | 762 | 22 | Neutral towards to Policymaking | Agree to dissemination of GK | Neutral towards to Policymaking Agree to disse... | 769 762 | #92bec1 | 68 | 339 | 0.323529 | 0.064897 | 0.020996 | Neutral towards to Policymaking 2.1 | Agree to dissemination of GK | 22 |
| 17 | 768 | 765 | 14 | Agree to Policymaking | Disagree to dissemination of GK | Agree to Policymaking Disagree to disseminatio... | 768 765 | #65f407 | 318 | 35 | 0.044025 | 0.400000 | 0.017610 | Agree to Policymaking 1.76 | Disagree to dissemination of GK | 14 |
| 9 | 770 | 762 | 15 | Disagree to Policymaking | Agree to dissemination of GK | Disagree to Policymaking Agree to disseminatio... | 770 762 | #748481 | 39 | 339 | 0.384615 | 0.044248 | 0.017018 | Disagree to Policymaking 1.7 | Agree to dissemination of GK | 15 |
| 4 | 770 | 761 | 6 | Disagree to Policymaking | Strongly disagree to dissemination of GK | Disagree to Policymaking Strongly disagree to ... | 770 761 | #49b3ae | 39 | 56 | 0.153846 | 0.107143 | 0.016484 | Disagree to Policymaking 1.65 | Strongly disagree to dissemination of GK | 6 |
| 45 | 775 | 770 | 7 | Neutral towards to Revising and Updating | Disagree to Policymaking | Neutral towards to Revising and Updating Disag... | 775 770 | #d94e86 | 83 | 39 | 0.084337 | 0.179487 | 0.015137 | Neutral towards to Revising and Updating 1.51 | Disagree to Policymaking | 7 |
| 21 | 768 | 764 | 15 | Agree to Policymaking | Neutral towards to dissemination of GK | Agree to Policymaking Neutral towards to disse... | 768 764 | #973e94 | 318 | 48 | 0.047170 | 0.312500 | 0.014741 | Agree to Policymaking 1.47 | Neutral towards to dissemination of GK | 15 |
| 43 | 772 | 770 | 13 | Agree to Revising and Updating | Disagree to Policymaking | Agree to Revising and Updating Disagree to Pol... | 772 770 | #3d1022 | 353 | 39 | 0.036827 | 0.333333 | 0.012276 | Agree to Revising and Updating 1.23 | Disagree to Policymaking | 13 |
| 13 | 769 | 763 | 15 | Neutral towards to Policymaking | Strongly agree to dissemination of GK | Neutral towards to Policymaking Strongly agree... | 769 763 | #8b4926 | 68 | 276 | 0.220588 | 0.054348 | 0.011988 | Neutral towards to Policymaking 1.2 | Strongly agree to dissemination of GK | 15 |
| 15 | 766 | 765 | 4 | Strongly disagree to Policymaking | Disagree to dissemination of GK | Strongly disagree to Policymaking Disagree to ... | 766 765 | #59a586 | 45 | 35 | 0.088889 | 0.114286 | 0.010159 | Strongly disagree to Policymaking 1.02 | Disagree to dissemination of GK | 4 |
| 23 | 770 | 764 | 4 | Disagree to Policymaking | Neutral towards to dissemination of GK | Disagree to Policymaking Neutral towards to di... | 770 764 | #d567a4 | 39 | 48 | 0.102564 | 0.083333 | 0.008547 | Disagree to Policymaking 0.85 | Neutral towards to dissemination of GK | 4 |
| 35 | 773 | 768 | 22 | Strongly agree to Revising and Updating | Agree to Policymaking | Strongly agree to Revising and Updating Agree ... | 773 768 | #8352c8 | 231 | 318 | 0.095238 | 0.069182 | 0.006589 | Strongly agree to Revising and Updating 0.66 | Agree to Policymaking | 22 |
| 32 | 775 | 767 | 12 | Neutral towards to Revising and Updating | Strongly agree to Policymaking | Neutral towards to Revising and Updating Stron... | 775 767 | #1b4c87 | 83 | 278 | 0.144578 | 0.043165 | 0.006241 | Neutral towards to Revising and Updating 0.62 | Strongly agree to Policymaking | 12 |
| 36 | 774 | 768 | 8 | Disagree to Revising and Updating | Agree to Policymaking | Disagree to Revising and Updating Agree to Pol... | 774 768 | #15de56 | 38 | 318 | 0.210526 | 0.025157 | 0.005296 | Disagree to Revising and Updating 0.53 | Agree to Policymaking | 8 |
| 27 | 774 | 766 | 3 | Disagree to Revising and Updating | Strongly disagree to Policymaking | Disagree to Revising and Updating Strongly dis... | 774 766 | #027724 | 38 | 45 | 0.078947 | 0.066667 | 0.005263 | Disagree to Revising and Updating 0.53 | Strongly disagree to Policymaking | 3 |
| 3 | 769 | 761 | 4 | Neutral towards to Policymaking | Strongly disagree to dissemination of GK | Neutral towards to Policymaking Strongly disag... | 769 761 | #57b34f | 68 | 56 | 0.058824 | 0.071429 | 0.004202 | Neutral towards to Policymaking 0.42 | Strongly disagree to dissemination of GK | 4 |
| 18 | 769 | 765 | 3 | Neutral towards to Policymaking | Disagree to dissemination of GK | Neutral towards to Policymaking Disagree to di... | 769 765 | #de8c1d | 68 | 35 | 0.044118 | 0.085714 | 0.003782 | Neutral towards to Policymaking 0.38 | Disagree to dissemination of GK | 3 |
| 41 | 774 | 769 | 3 | Disagree to Revising and Updating | Neutral towards to Policymaking | Disagree to Revising and Updating Neutral towa... | 774 769 | #6fc34e | 38 | 68 | 0.078947 | 0.044118 | 0.003483 | Disagree to Revising and Updating 0.35 | Neutral towards to Policymaking | 3 |
| 31 | 774 | 767 | 6 | Disagree to Revising and Updating | Strongly agree to Policymaking | Disagree to Revising and Updating Strongly agr... | 774 767 | #e213e2 | 38 | 278 | 0.157895 | 0.021583 | 0.003408 | Disagree to Revising and Updating 0.34 | Strongly agree to Policymaking | 6 |
| 40 | 773 | 769 | 6 | Strongly agree to Revising and Updating | Neutral towards to Policymaking | Strongly agree to Revising and Updating Neutra... | 773 769 | #ee4859 | 231 | 68 | 0.025974 | 0.088235 | 0.002292 | Strongly agree to Revising and Updating 0.23 | Neutral towards to Policymaking | 6 |
| 1 | 767 | 761 | 5 | Strongly agree to Policymaking | Strongly disagree to dissemination of GK | Strongly agree to Policymaking Strongly disagr... | 767 761 | #7bbd6b | 278 | 56 | 0.017986 | 0.089286 | 0.001606 | Strongly agree to Policymaking 0.16 | Strongly disagree to dissemination of GK | 5 |
| 25 | 772 | 766 | 5 | Agree to Revising and Updating | Strongly disagree to Policymaking | Agree to Revising and Updating Strongly disagr... | 772 766 | #fbe3d6 | 353 | 45 | 0.014164 | 0.111111 | 0.001574 | Agree to Revising and Updating 0.16 | Strongly disagree to Policymaking | 5 |
| 38 | 771 | 769 | 2 | Strongly disagree to Revising and Updating | Neutral towards to Policymaking | Strongly disagree to Revising and Updating Neu... | 771 769 | #570f0c | 41 | 68 | 0.048780 | 0.029412 | 0.001435 | Strongly disagree to Revising and Updating 0.14 | Neutral towards to Policymaking | 2 |
| 2 | 768 | 761 | 5 | Agree to Policymaking | Strongly disagree to dissemination of GK | Agree to Policymaking Strongly disagree to dis... | 768 761 | #792808 | 318 | 56 | 0.015723 | 0.089286 | 0.001404 | Agree to Policymaking 0.14 | Strongly disagree to dissemination of GK | 5 |
| 10 | 766 | 763 | 4 | Strongly disagree to Policymaking | Strongly agree to dissemination of GK | Strongly disagree to Policymaking Strongly agr... | 766 763 | #a05d02 | 45 | 276 | 0.088889 | 0.014493 | 0.001288 | Strongly disagree to Policymaking 0.13 | Strongly agree to dissemination of GK | 4 |
| 20 | 767 | 764 | 4 | Strongly agree to Policymaking | Neutral towards to dissemination of GK | Strongly agree to Policymaking Neutral towards... | 767 764 | #bbb264 | 278 | 48 | 0.014388 | 0.083333 | 0.001199 | Strongly agree to Policymaking 0.12 | Neutral towards to dissemination of GK | 4 |
| 5 | 766 | 762 | 3 | Strongly disagree to Policymaking | Agree to dissemination of GK | Strongly disagree to Policymaking Agree to dis... | 766 762 | #d20880 | 45 | 339 | 0.066667 | 0.008850 | 0.000590 | Strongly disagree to Policymaking 0.06 | Agree to dissemination of GK | 3 |
| 16 | 767 | 765 | 2 | Strongly agree to Policymaking | Disagree to dissemination of GK | Strongly agree to Policymaking Disagree to dis... | 767 765 | #67fc93 | 278 | 35 | 0.007194 | 0.057143 | 0.000411 | Strongly agree to Policymaking 0.04 | Disagree to dissemination of GK | 2 |
| 26 | 773 | 766 | 2 | Strongly agree to Revising and Updating | Strongly disagree to Policymaking | Strongly agree to Revising and Updating Strong... | 773 766 | #b1945b | 231 | 45 | 0.008658 | 0.044444 | 0.000385 | Strongly agree to Revising and Updating 0.04 | Strongly disagree to Policymaking | 2 |
| 14 | 770 | 763 | 2 | Disagree to Policymaking | Strongly agree to dissemination of GK | Disagree to Policymaking Strongly agree to dis... | 770 763 | #c6c619 | 39 | 276 | 0.051282 | 0.007246 | 0.000372 | Disagree to Policymaking 0.04 | Strongly agree to dissemination of GK | 2 |
| 28 | 771 | 767 | 2 | Strongly disagree to Revising and Updating | Strongly agree to Policymaking | Strongly disagree to Revising and Updating Str... | 771 767 | #7a750e | 41 | 278 | 0.048780 | 0.007194 | 0.000351 | Strongly disagree to Revising and Updating 0.04 | Strongly agree to Policymaking | 2 |
| 33 | 771 | 768 | 2 | Strongly disagree to Revising and Updating | Agree to Policymaking | Strongly disagree to Revising and Updating Agr... | 771 768 | #02b8c8 | 41 | 318 | 0.048780 | 0.006289 | 0.000307 | Strongly disagree to Revising and Updating 0.03 | Agree to Policymaking | 2 |
xor_triple
| 1 | 2 | 3 | count | 1 | 2 | 3 | |
|---|---|---|---|---|---|---|---|
| 22 | 762 | 768 | 772 | 185 | Agree to dissemination of GK | Agree to Policymaking | Agree to Revising and Updating |
| 37 | 763 | 767 | 773 | 149 | Strongly agree to dissemination of GK | Strongly agree to Policymaking | Strongly agree to Revising and Updating |
| 19 | 762 | 767 | 773 | 45 | Agree to dissemination of GK | Strongly agree to Policymaking | Strongly agree to Revising and Updating |
| 40 | 763 | 768 | 772 | 43 | Strongly agree to dissemination of GK | Agree to Policymaking | Agree to Revising and Updating |
| 36 | 763 | 767 | 772 | 38 | Strongly agree to dissemination of GK | Strongly agree to Policymaking | Agree to Revising and Updating |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1 | 761 | 766 | 772 | 1 | Strongly disagree to dissemination of GK | Strongly disagree to Policymaking | Agree to Revising and Updating |
| 34 | 763 | 766 | 772 | 1 | Strongly agree to dissemination of GK | Strongly disagree to Policymaking | Agree to Revising and Updating |
| 27 | 762 | 769 | 773 | 1 | Agree to dissemination of GK | Neutral towards to Policymaking | Strongly agree to Revising and Updating |
| 17 | 762 | 766 | 773 | 1 | Agree to dissemination of GK | Strongly disagree to Policymaking | Strongly agree to Revising and Updating |
| 6 | 761 | 767 | 774 | 1 | Strongly disagree to dissemination of GK | Strongly agree to Policymaking | Disagree to Revising and Updating |
73 rows × 7 columns
from pyvis.network import Network
from itertools import combinations
import networkx
import nxviz as nv
import matplotlib as mpl
mpl.style.use('classic')
df_graph = rnxn_combined
df_graph['From'] = df_graph[3].map(str)+' '+ ((df_graph['p1p2']*100).round(2)).map(str)
df_graph['To'] = df_graph[4]
df_graph['Count'] = df_graph['counts']
colors=cls['colour']
weights = df_graph['counts']
G = networkx.from_pandas_edgelist(
df_graph, source="From", target="To", edge_attr="Count"
)
#################3333333
# dynamic edge sizes
scale=100 # Scaling the size of the edges by 3 degrees
d = dict(G.degree)
#Updating dict
d.update((x, scale*y) for x, y in d.items())
####
plt.figure(figsize=(15,15))
plt.rcParams['figure.facecolor'] = 'white'
graph_pos = networkx.spring_layout(G)
G = networkx.draw_networkx(G, pos = networkx.nx_pydot.graphviz_layout(G), edge_color=colors, node_color='blue',alpha=1,
width=weights*0.1, arrows= False, with_labels=True, font_size=10, font_family='sans-serif'
)
plt.tight_layout()
plt.savefig('network_35_45_53.png', dpi=300)
from pyvis.network import Network
got_net = Network(height='1080px', width='100%', bgcolor='#ffffff', font_color='black', directed=False)
# set the physics layout of the network
# got_net.barnes_hut()
got_data = rnxn_combined
got_data = got_data[got_data['p1p2'] >= 0.1]
sources = got_data[3]
targets = got_data[4]
weights_edges = got_data['p1p2'].round(3)
weights_n1 = got_data['p1'].round(3)
weights_n2 = got_data['p2'].round(3)
colours = got_data['colour']
edge_data = zip(sources, targets, weights_edges, weights_n1, weights_n2, colours)
for e in edge_data:
src = e[0]
dst = e[1]
we = e[2]
wn1 = e[3]
wn2 = e[4]
c = e[5]
got_net.add_node(src, src, title=src, value=wn1, color=c)
got_net.add_node(dst, dst, title=dst, value=wn2, color=c)
got_net.add_edge(src, dst, value=we, color=c)
neighbor_map = got_net.get_adj_list()
edges = got_net.get_edges()
nodes=got_net.get_nodes()
N_nodes=len(nodes)
N_edges=len(edges)
weights=[[] for i in range(N_nodes)]
#Associating weights to neighbors
for i in range(N_nodes): #Loop through nodes
for neighbor in neighbor_map[nodes[i]]: #and neighbors
for j in range(N_edges): #associate weights to the edge between node and neighbor
if (edges[j]['from']==nodes[i] and edges[j]['to']==neighbor) or \
(edges[j]['from']==neighbor and edges[j]['to']==nodes[i]):
weights[i].append(edges[j]['value'])
for node,i in zip(got_net.nodes,range(N_nodes)):
node['value']=len(neighbor_map[node['id']])
node['weight']=[str(weights[i][k]) for k in range(len(weights[i]))]
list_neighbor=list(neighbor_map[node['id']])
#Concatenating neighbors and weights
hover_str=[list_neighbor[k]+' '+ node['weight'][k] for k in range(node['value'])]
#Setting up node title for hovering
node['title']+=' Neighbors:<br>'+'<br>'.join(hover_str)
got_net.show_buttons(filter_=['physics'])
got_net.show('allnet_network_35_45_53.html')
#render dataframe as html
html = rnxn_combined.to_html()
#write html to file
text_file = open("PATHS_RNXN_COMBINED.html", "w")
text_file.write(html)
text_file.close()
#render dataframe as html
html = rnxn_triple.to_html()
#write html to file
text_file = open("PATHS_RNXN_TRIPLE.html", "w")
text_file.write(html)
text_file.close()
# functions and imports
from scipy.stats import gaussian_kde
from numpy import mean
from numpy import std
from scipy.stats import mannwhitneyu
from scipy.stats import ttest_ind
from scipy.stats import f_oneway
from scipy import stats
import scikit_posthocs as sp
import statistics
def calc_curve(data):
"""Calculate probability density."""
min_, max_ = data.min(), data.max()
X = [min_ + i * ((max_ - min_) / 500) for i in range(501)]
Y = gaussian_kde(data).evaluate(X)
return(X, Y)
from plotly.offline import plot
data1 = rnxn_triple['p1p2']
data2 = rnxn_combined['p1p2']
X1, Y1 = calc_curve(data1)
X2, Y2 = calc_curve(data2)
traces = []
traces.append({'x': X1, 'y': Y1, 'name': 'Triple'})
traces.append({'x': X2, 'y': Y2, 'name': 'Combined'})
plot({'data': traces})
'temp-plot.html'
cl_DF = BNdf.copy()
dummies = pd.get_dummies(BNdf['Option'])
dummies.head(5)
| Agree to Policymaking | Agree to Revising and Updating | Agree to dissemination of GK | Children - Definitely | Children - Most Likely | Children - Never | Children - Under certain circumstances | Disagree to Policymaking | Disagree to Revising and Updating | Disagree to dissemination of GK | Do not know whether the data will be stored securely | Do not know who will have access to that information | Female Participants | Friends - Definitely | Friends - Most Likely | Friends - Never | Friends - Under certain circumstances | Future spouse or partner - Definitely | Future spouse or partner - Most Likely | Future spouse or partner - Never | Future spouse or partner - Under certain circumstances | High Concern | High GK Confidence | High GK Score | High Genetic Curiosity | I am concerned my data will be used for other purposes without my knowledge | I am not interested | I am worried some information about my physical or mental health could be used against me for example employment; legal matters; obtaining insurance | I am worried that I might find out something about myself I would rather not know | I would not want to be labelled as having any deficiency | I would rather not know of any potential debilitating diseases that I may develop in the future | Law Students | Low Concern | Low GK Confidence | Low GK Score | Low Genetic Curiosity | Male Participants | Medium Concern | Medium Genetic Curiosity | Neutral towards to Policymaking | Neutral towards to Revising and Updating | Neutral towards to dissemination of GK | Non Law Students | Not Students | Older Participants | Other | Other - Definitely | Other - Most Likely | Other - Never | Other - Under certain circumstances | Other relatives - Definitely | Other relatives - Most Likely | Other relatives - Never | Other relatives - Under certain circumstances | Participants not related to law | Participants related to law | Siblings - Definitely | Siblings - Most Likely | Siblings - Never | Siblings - Under certain circumstances | Spouse or partner - Definitely | Spouse or partner - Most Likely | Spouse or partner - Never | Spouse or partner - Under certain circumstances | Strongly agree to Policymaking | Strongly agree to Revising and Updating | Strongly agree to dissemination of GK | Strongly disagree to Policymaking | Strongly disagree to Revising and Updating | Strongly disagree to dissemination of GK | Students | Younger Participants | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
dummies['id'] = cl_DF.id
dummies
| Agree to Policymaking | Agree to Revising and Updating | Agree to dissemination of GK | Children - Definitely | Children - Most Likely | Children - Never | Children - Under certain circumstances | Disagree to Policymaking | Disagree to Revising and Updating | Disagree to dissemination of GK | Do not know whether the data will be stored securely | Do not know who will have access to that information | Female Participants | Friends - Definitely | Friends - Most Likely | Friends - Never | Friends - Under certain circumstances | Future spouse or partner - Definitely | Future spouse or partner - Most Likely | Future spouse or partner - Never | Future spouse or partner - Under certain circumstances | High Concern | High GK Confidence | High GK Score | High Genetic Curiosity | I am concerned my data will be used for other purposes without my knowledge | I am not interested | I am worried some information about my physical or mental health could be used against me for example employment; legal matters; obtaining insurance | I am worried that I might find out something about myself I would rather not know | I would not want to be labelled as having any deficiency | I would rather not know of any potential debilitating diseases that I may develop in the future | Law Students | Low Concern | Low GK Confidence | Low GK Score | Low Genetic Curiosity | Male Participants | Medium Concern | Medium Genetic Curiosity | Neutral towards to Policymaking | Neutral towards to Revising and Updating | Neutral towards to dissemination of GK | Non Law Students | Not Students | Older Participants | Other | Other - Definitely | Other - Most Likely | Other - Never | Other - Under certain circumstances | Other relatives - Definitely | Other relatives - Most Likely | Other relatives - Never | Other relatives - Under certain circumstances | Participants not related to law | Participants related to law | Siblings - Definitely | Siblings - Most Likely | Siblings - Never | Siblings - Under certain circumstances | Spouse or partner - Definitely | Spouse or partner - Most Likely | Spouse or partner - Never | Spouse or partner - Under certain circumstances | Strongly agree to Policymaking | Strongly agree to Revising and Updating | Strongly agree to dissemination of GK | Strongly disagree to Policymaking | Strongly disagree to Revising and Updating | Strongly disagree to dissemination of GK | Students | Younger Participants | id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 |
| 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 |
| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 19046 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1875 |
| 19047 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1885 |
| 19048 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1886 |
| 19049 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1887 |
| 19050 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1888 |
19051 rows × 73 columns
df = dummies.groupby(['id']).sum().reset_index()
del df['id']
import dash_bio
fig = dash_bio.Clustergram(
data=df,
column_labels=list(df.columns.values),
row_labels=list(df.index),
height=2080,
width=2080
)
for template in ["seaborn"]:
fig.update_layout(template=template)
fig.write_html("/home/manu10/Downloads/iglas_work/BIG_Cluster.html")
#fig.show()
newbndf = pd.concat([curiosity_df, nen_df, con_27, likert_df]).reset_index()
del newbndf['index']
XDF = pd.merge(newbndf, maindf, on='id')
cXDF = XDF[['id', 'Option_x', 'Description_x', 'Option_y']].copy()
cXDF['id'] = 1
cb_xn_fin = cXDF.groupby(['Option_x', 'Option_y','Description_x'])['id'].sum().reset_index()
cb_xn_fin
cb_xn_fin['total'] = 1
total = cb_xn_fin.groupby(['Option_y']).total.sum().reset_index()
cb_xn = pd.merge(cb_xn_fin, total, on='Option_y')
cb_xn['prop'] = (cb_xn['id']/cb_xn['total_y']).round(3)
cb_xn
| Option_x | Option_y | Description_x | id | total_x | total_y | prop | |
|---|---|---|---|---|---|---|---|
| 0 | Agree to Policymaking | Female Participants | Policymaking | 210 | 1 | 232 | 0.905 |
| 1 | Agree to Revising and Updating | Female Participants | Revising and updating | 234 | 1 | 232 | 1.009 |
| 2 | Agree to dissemination of GK | Female Participants | Dissemination of GK | 224 | 1 | 232 | 0.966 |
| 3 | Children - Definitely | Female Participants | Would you be interested in finding out about g... | 178 | 1 | 232 | 0.767 |
| 4 | Children - Most Likely | Female Participants | Would you be interested in finding out about g... | 148 | 1 | 232 | 0.638 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 4372 | Younger Participants I am worried some informa... | Younger Participants | Age | 361 | 1 | 231 | 1.563 |
| 4373 | Younger Participants I am worried that I might... | Younger Participants | Age | 71 | 1 | 231 | 0.307 |
| 4374 | Younger Participants I would not want to be la... | Younger Participants | Age | 132 | 1 | 231 | 0.571 |
| 4375 | Younger Participants I would rather not know o... | Younger Participants | Age | 64 | 1 | 231 | 0.277 |
| 4376 | Younger Participants Other | Younger Participants | Age | 19 | 1 | 231 | 0.082 |
4377 rows × 7 columns
new_df = cb_xn[['Option_x', 'Option_y', 'prop']].copy()
new_df
| Option_x | Option_y | prop | |
|---|---|---|---|
| 0 | Agree to Policymaking | Female Participants | 0.905 |
| 1 | Agree to Revising and Updating | Female Participants | 1.009 |
| 2 | Agree to dissemination of GK | Female Participants | 0.966 |
| 3 | Children - Definitely | Female Participants | 0.767 |
| 4 | Children - Most Likely | Female Participants | 0.638 |
| ... | ... | ... | ... |
| 4372 | Younger Participants I am worried some informa... | Younger Participants | 1.563 |
| 4373 | Younger Participants I am worried that I might... | Younger Participants | 0.307 |
| 4374 | Younger Participants I would not want to be la... | Younger Participants | 0.571 |
| 4375 | Younger Participants I would rather not know o... | Younger Participants | 0.277 |
| 4376 | Younger Participants Other | Younger Participants | 0.082 |
4377 rows × 3 columns
df = new_df
df = df.pivot_table(index = 'Option_x', columns = 'Option_y', values = 'prop')
df = df.dropna()
import dash_bio
fig = dash_bio.Clustergram(
data=df,
column_labels=list(df.columns.values),
row_labels=list(df.index),
height=2080,
width=3080
)
for template in ["seaborn"]:
fig.update_layout(template=template)
fig.write_html("/home/manu10/Downloads/iglas_work/GROUPED_BIG_Cluster.html")
#fig.show()
metadata.Group = metadata['Group'].apply(str)
cat_select = metadata[metadata['Group'].isin(['8', '9'])]
nspecialdf = pd.merge(cat_select, specialdf, on='Variable')
nspecialdf['Option'] = nspecialdf['Option_x']+' '+nspecialdf['value']
sdf = nspecialdf[['id', 'Variable', 'Description_x', 'Option', 'Group_x']]
sdf.columns = ['id','Variable', 'Description', 'Option', 'Group'].copy()
sdf
| id | Variable | Description | Option | Group | |
|---|---|---|---|---|---|
| 0 | 0 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| 1 | 1 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| 2 | 3 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| 3 | 5 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| 4 | 6 | LE3.045 | Please indicate whether the following endeavou... | Disease prevention and cure Positive | 8 |
| ... | ... | ... | ... | ... | ... |
| 24325 | 1889 | LE3.085 | Genetic science can contribute to the followin... | Other None | 9 |
| 24326 | 1893 | LE3.085 | Genetic science can contribute to the followin... | Other Negative | 9 |
| 24327 | 1903 | LE3.085 | Genetic science can contribute to the followin... | Other None | 9 |
| 24328 | 1905 | LE3.085 | Genetic science can contribute to the followin... | Other None | 9 |
| 24329 | 1911 | LE3.085 | Genetic science can contribute to the followin... | Other None | 9 |
24330 rows × 5 columns
BNdf['Group'] = BNdf['Group'].map(str)
BNdf.Group.unique()
array(['77', '24', '27', '3', '4', '5'], dtype=object)
giant_BN = pd.concat([sdf, BNdf], axis=0)
category_frame = giant_BN[giant_BN['Group'] == '77']
len_options = len(category_frame.Description.unique())
ranges = list(range(78, 78+len_options))
len(ranges) == len(category_frame.Description.unique())
True
options = category_frame.Description.unique()
categories = dict(zip(options,ranges))
categories
{'GK Score': 78,
'Gender': 79,
'Age': 80,
'Confidence in GK': 81,
'Related/ Not related to law': 82,
'Students/ Non Students': 83,
'Law or Non Law Students and Non Students': 84,
'Concern': 85,
'Genetic Curiosity': 86}
category_frame['Group'] = category_frame['Description']
category_frame['Group'] = category_frame['Group'].map(categories)
category_frame
| id | Variable | Description | Option | Group | |
|---|---|---|---|---|---|
| 0 | 0 | Class_X | GK Score | Low GK Score | 78 |
| 1 | 1 | Class_X | GK Score | High GK Score | 78 |
| 2 | 3 | Class_X | GK Score | High GK Score | 78 |
| 3 | 5 | Class_X | GK Score | Low GK Score | 78 |
| 4 | 14 | Class_X | GK Score | Low GK Score | 78 |
| ... | ... | ... | ... | ... | ... |
| 6952 | 1875 | Class_X | Genetic Curiosity | Low Genetic Curiosity | 86 |
| 6953 | 1885 | Class_X | Genetic Curiosity | Medium Genetic Curiosity | 86 |
| 6954 | 1886 | Class_X | Genetic Curiosity | Medium Genetic Curiosity | 86 |
| 6955 | 1887 | Class_X | Genetic Curiosity | Medium Genetic Curiosity | 86 |
| 6956 | 1888 | Class_X | Genetic Curiosity | Low Genetic Curiosity | 86 |
6957 rows × 5 columns
giant_BN = giant_BN[giant_BN['Group']!='77']
all_GR_BN = pd.concat([category_frame, giant_BN], axis=0)
all_GR_BN
| id | Variable | Description | Option | Group | |
|---|---|---|---|---|---|
| 0 | 0 | Class_X | GK Score | Low GK Score | 78 |
| 1 | 1 | Class_X | GK Score | High GK Score | 78 |
| 2 | 3 | Class_X | GK Score | High GK Score | 78 |
| 3 | 5 | Class_X | GK Score | Low GK Score | 78 |
| 4 | 14 | Class_X | GK Score | Low GK Score | 78 |
| ... | ... | ... | ... | ... | ... |
| 19046 | 1875 | LE3.201 | Revising and updating | Strongly agree to Revising and Updating | 5 |
| 19047 | 1885 | LE3.201 | Revising and updating | Agree to Revising and Updating | 5 |
| 19048 | 1886 | LE3.201 | Revising and updating | Strongly agree to Revising and Updating | 5 |
| 19049 | 1887 | LE3.201 | Revising and updating | Strongly agree to Revising and Updating | 5 |
| 19050 | 1888 | LE3.201 | Revising and updating | Strongly agree to Revising and Updating | 5 |
43381 rows × 5 columns
import itertools
from itertools import permutations
iterable = all_GR_BN.Group.unique()
all_select = list(itertools.permutations(iterable, 2))
len(all_select)
240
for item in all_select:
print(list(item))
[78, 79] [78, 80] [78, 81] [78, 82] [78, 83] [78, 84] [78, 85] [78, 86] [78, '8'] [78, '9'] [78, '24'] [78, '27'] [78, '3'] [78, '4'] [78, '5'] [79, 78] [79, 80] [79, 81] [79, 82] [79, 83] [79, 84] [79, 85] [79, 86] [79, '8'] [79, '9'] [79, '24'] [79, '27'] [79, '3'] [79, '4'] [79, '5'] [80, 78] [80, 79] [80, 81] [80, 82] [80, 83] [80, 84] [80, 85] [80, 86] [80, '8'] [80, '9'] [80, '24'] [80, '27'] [80, '3'] [80, '4'] [80, '5'] [81, 78] [81, 79] [81, 80] [81, 82] [81, 83] [81, 84] [81, 85] [81, 86] [81, '8'] [81, '9'] [81, '24'] [81, '27'] [81, '3'] [81, '4'] [81, '5'] [82, 78] [82, 79] [82, 80] [82, 81] [82, 83] [82, 84] [82, 85] [82, 86] [82, '8'] [82, '9'] [82, '24'] [82, '27'] [82, '3'] [82, '4'] [82, '5'] [83, 78] [83, 79] [83, 80] [83, 81] [83, 82] [83, 84] [83, 85] [83, 86] [83, '8'] [83, '9'] [83, '24'] [83, '27'] [83, '3'] [83, '4'] [83, '5'] [84, 78] [84, 79] [84, 80] [84, 81] [84, 82] [84, 83] [84, 85] [84, 86] [84, '8'] [84, '9'] [84, '24'] [84, '27'] [84, '3'] [84, '4'] [84, '5'] [85, 78] [85, 79] [85, 80] [85, 81] [85, 82] [85, 83] [85, 84] [85, 86] [85, '8'] [85, '9'] [85, '24'] [85, '27'] [85, '3'] [85, '4'] [85, '5'] [86, 78] [86, 79] [86, 80] [86, 81] [86, 82] [86, 83] [86, 84] [86, 85] [86, '8'] [86, '9'] [86, '24'] [86, '27'] [86, '3'] [86, '4'] [86, '5'] ['8', 78] ['8', 79] ['8', 80] ['8', 81] ['8', 82] ['8', 83] ['8', 84] ['8', 85] ['8', 86] ['8', '9'] ['8', '24'] ['8', '27'] ['8', '3'] ['8', '4'] ['8', '5'] ['9', 78] ['9', 79] ['9', 80] ['9', 81] ['9', 82] ['9', 83] ['9', 84] ['9', 85] ['9', 86] ['9', '8'] ['9', '24'] ['9', '27'] ['9', '3'] ['9', '4'] ['9', '5'] ['24', 78] ['24', 79] ['24', 80] ['24', 81] ['24', 82] ['24', 83] ['24', 84] ['24', 85] ['24', 86] ['24', '8'] ['24', '9'] ['24', '27'] ['24', '3'] ['24', '4'] ['24', '5'] ['27', 78] ['27', 79] ['27', 80] ['27', 81] ['27', 82] ['27', 83] ['27', 84] ['27', 85] ['27', 86] ['27', '8'] ['27', '9'] ['27', '24'] ['27', '3'] ['27', '4'] ['27', '5'] ['3', 78] ['3', 79] ['3', 80] ['3', 81] ['3', 82] ['3', 83] ['3', 84] ['3', 85] ['3', 86] ['3', '8'] ['3', '9'] ['3', '24'] ['3', '27'] ['3', '4'] ['3', '5'] ['4', 78] ['4', 79] ['4', 80] ['4', 81] ['4', 82] ['4', 83] ['4', 84] ['4', 85] ['4', 86] ['4', '8'] ['4', '9'] ['4', '24'] ['4', '27'] ['4', '3'] ['4', '5'] ['5', 78] ['5', 79] ['5', 80] ['5', 81] ['5', 82] ['5', 83] ['5', 84] ['5', 85] ['5', 86] ['5', '8'] ['5', '9'] ['5', '24'] ['5', '27'] ['5', '3'] ['5', '4']
grnxn = pd.DataFrame()
%%time
for item in all_select:
select= list(item)
nndf = all_GR_BN[all_GR_BN['Group'].isin(select)]
#cps['Option'] = cps['Option']+' '+cps['Description']
sources = nndf[['id', 'Option']].copy()
len_options = len(nndf.Option.unique())
len_options
len_ids = len(nndf.id.unique()) +1
len_ids
ranges = list(range(len_ids, len_ids+len_options))
len(ranges) == len(nndf.Option.unique())
options = nndf.Option.unique()
options
# get categorical codes
categories = dict(zip(options,ranges))
categories
sources['codes'] = sources['Option'].map(categories)
xtt=pd.DataFrame()
xtt = sources[['Option', 'codes']].copy()
# get source codes and counts
sources['codes'] = sources['codes'].map(str)
counts = sources.groupby(["id"])["codes"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+counts['codes'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
counts['xcodes'] = nx.iloc[:,2]
gcounts = sources.groupby(["id"])["Option"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+gcounts['Option'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
gcounts['xoption'] = nx.iloc[:,2]
gcounts
lel = pd.merge(counts, gcounts, on='id')
del lel['codes']
del lel['Option']
lel
# writing operations
wo = []
for i in range(len(counts['xcodes'])) :
wo.append(pd.Series(counts.iloc[i, 2]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
# value counts df
vc = pd.DataFrame(wo)
# counts
cxounts = pd.concat([lel, vc], axis=1)
lex = cxounts.set_index(['id','xcodes', 'xoption']).stack().reset_index()
lex['counts'] = lex[0]
lex['codes'] = lex['level_3']
del lex[0]
del lex['level_3']
# paths
lex['path'] = """'""" + lex["id"].astype(str)+"',"+lex["xcodes"]
lex['label'] = """'""" + lex["id"].astype(str)+"',"+lex["xoption"]
lex['path'] = lex['path'].str.replace("""'""", '')
lex['label'] = lex['label'].str.replace("""'""", '')
lex.head(2)
lex["counts"] = lex["counts"].map(int)
## paths and sources
path_list = list(lex.path.unique())
label_list = list(lex.xoption.unique())
############################################## corrected code
def zigzag(seq):
"""Return two sequences with alternating elements from `seq`"""
seq_int = [list(map(int, x)) for x in seq]
x = []
y = []
for i in seq_int:
for j, k in zip(i, i[1:]):
x.append(j)
y.append(k)
return list(zip(x, y))
# get a path graph
y = []
for i in range(len(path_list)):
y.append(list(path_list[i].split(',')))
big_list = zigzag(y)
#### MOST COMMON PATH
c_path = pd.DataFrame(big_list)
c_path = c_path[c_path[0].isin(ranges)] #remove the participant id initials
c_path[2] = c_path[0]
c_path[0] = '1'
c_path
########################## edit here
tagged = c_path.groupby([1, 2])[0].agg(lambda x: """','""".join(x[x != ''])).reset_index()
xtagged= ("""'"""+tagged[0].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
xtagged['counts'] = [len(x.split(',')) for x in xtagged[0].tolist()]
ztagged = pd.concat([tagged, xtagged], axis=1)
ztagged
####
inv_map = {str(v): str(k) for k, v in categories.items()}
fif = ztagged[[1, 2, 0, 'counts']]
fif[1] = fif[1].map(str)
fif[3] = fif[1].map(inv_map)
fif[2] = fif[2].map(str)
fif[4] = fif[2].map(inv_map)
del fif[0]
fif['label'] = fif[3] + ' ' + fif[4]
fif[1] = fif[1].map(int)
fif[2] = fif[2].map(int)
fif
fif['connections'] = fif.iloc[:,0].astype(str)+" "+fif.iloc[:,1].astype(str)
cls = pd.DataFrame()
cls['connections'] = pd.DataFrame(fif['connections'].unique())
import random
# generate random colours
amount = len(fif['connections'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
cls['colour'] = colour
fif = pd.merge(fif, cls, on='connections')
fif
def nodify(node_names):
node_names = unique_list
# uniqe name begginings
ends = sorted(list(set([e[0] for e in node_names])))
# intervals
steps = 1/4
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
# x and y values in list form
x_values = [nodes_x[n[0]] for n in node_names]
y_values = [x*0.03 for x in range(1, len(x_values))]
return x_values, y_values
# map colours to categories
import random
# generate random colours
amount = len(npaths['name'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
####### GET SOME SIGNIFICANT PATHS, options occuring together
nfif = fif[fif['counts'] > 0]
nfif
pax = pd.DataFrame(nndf).reset_index()
pax.id = 1
pax.drop('index', axis=1, inplace=True)
pax = pax.groupby('Option')['id'].sum().reset_index()
pax.columns = [3, 'id']
nxn = pd.merge(nfif, pax, on=3)
pax.columns = [4, 'idx']
rnxn = pd.merge(nxn, pax, on=4)
rnxn['p1'] = rnxn['counts']/rnxn['id']
rnxn['p2'] = rnxn['counts']/rnxn['idx']
rnxn['p1p2'] = rnxn['p1']*rnxn['p2']
#rnxn = rnxn[rnxn['p1p2'] >= .05]
rnxn.sort_values(['p1p2'], ascending=False, inplace=True)
grnxn = pd.concat([grnxn, rnxn], axis=0)
CPU times: user 9min 13s, sys: 350 ms, total: 9min 13s Wall time: 9min 13s
grnxn = grnxn.sort_values(['counts', 'p1p2'], ascending=[False, False]).drop_duplicates(subset=[3, 4], keep='last')
grnxn
| 1 | 2 | counts | 3 | 4 | label | connections | colour | id | idx | p1 | p2 | p1p2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 776 | 775 | 533 | Students | Younger Participants | Students Younger Participants | 776 775 | #352d00 | 561 | 599 | 0.950089 | 0.889816 | 8.454048e-01 |
| 4 | 776 | 775 | 409 | Low GK Confidence | Younger Participants | Low GK Confidence Younger Participants | 776 775 | #da356c | 519 | 599 | 0.788054 | 0.682805 | 5.380869e-01 |
| 3 | 777 | 774 | 399 | Younger Participants | Low GK Score | Younger Participants Low GK Score | 777 774 | #b4fbc5 | 599 | 496 | 0.666110 | 0.804435 | 5.358427e-01 |
| 2 | 776 | 774 | 382 | Students | Low GK Confidence | Students Low GK Confidence | 776 774 | #137500 | 561 | 519 | 0.680927 | 0.736031 | 5.011832e-01 |
| 2 | 776 | 774 | 381 | Students | Low GK Score | Students Low GK Score | 776 774 | #d88726 | 561 | 496 | 0.679144 | 0.768145 | 5.216815e-01 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 131 | 1480 | 1483 | 1 | Changing the structure of society Positive | New agricultural methods and products Positive | Changing the structure of society Positive New... | 1480 1483 | #bc49e7 | 773 | 1340 | 0.001294 | 0.000746 | 9.654187e-07 |
| 166 | 1516 | 1495 | 1 | Personalised education Positive | Developing biological weapons Negative | Personalised education Positive Developing bio... | 1516 1495 | #063ba4 | 1119 | 1140 | 0.000894 | 0.000877 | 7.839079e-07 |
| 139 | 1516 | 1483 | 1 | Personalised education Positive | New agricultural methods and products Positive | Personalised education Positive New agricultur... | 1516 1483 | #e0644f | 1119 | 1340 | 0.000894 | 0.000746 | 6.669068e-07 |
| 138 | 1495 | 1483 | 1 | Developing biological weapons Negative | New agricultural methods and products Positive | Developing biological weapons Negative New agr... | 1495 1483 | #f3a8ce | 1140 | 1340 | 0.000877 | 0.000746 | 6.546216e-07 |
| 14 | 1510 | 1486 | 1 | Increased Longevity Positive | Reducing food scarcity Positive | Increased Longevity Positive Reducing food sca... | 1510 1486 | #888745 | 1340 | 1260 | 0.000746 | 0.000794 | 5.922767e-07 |
3599 rows × 13 columns
grnxn
| 1 | 2 | counts | 3 | 4 | label | connections | colour | id | idx | p1 | p2 | p1p2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 776 | 775 | 533 | Students | Younger Participants | Students Younger Participants | 776 775 | #352d00 | 561 | 599 | 0.950089 | 0.889816 | 8.454048e-01 |
| 4 | 776 | 775 | 409 | Low GK Confidence | Younger Participants | Low GK Confidence Younger Participants | 776 775 | #da356c | 519 | 599 | 0.788054 | 0.682805 | 5.380869e-01 |
| 3 | 777 | 774 | 399 | Younger Participants | Low GK Score | Younger Participants Low GK Score | 777 774 | #b4fbc5 | 599 | 496 | 0.666110 | 0.804435 | 5.358427e-01 |
| 2 | 776 | 774 | 382 | Students | Low GK Confidence | Students Low GK Confidence | 776 774 | #137500 | 561 | 519 | 0.680927 | 0.736031 | 5.011832e-01 |
| 2 | 776 | 774 | 381 | Students | Low GK Score | Students Low GK Score | 776 774 | #d88726 | 561 | 496 | 0.679144 | 0.768145 | 5.216815e-01 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 131 | 1480 | 1483 | 1 | Changing the structure of society Positive | New agricultural methods and products Positive | Changing the structure of society Positive New... | 1480 1483 | #bc49e7 | 773 | 1340 | 0.001294 | 0.000746 | 9.654187e-07 |
| 166 | 1516 | 1495 | 1 | Personalised education Positive | Developing biological weapons Negative | Personalised education Positive Developing bio... | 1516 1495 | #063ba4 | 1119 | 1140 | 0.000894 | 0.000877 | 7.839079e-07 |
| 139 | 1516 | 1483 | 1 | Personalised education Positive | New agricultural methods and products Positive | Personalised education Positive New agricultur... | 1516 1483 | #e0644f | 1119 | 1340 | 0.000894 | 0.000746 | 6.669068e-07 |
| 138 | 1495 | 1483 | 1 | Developing biological weapons Negative | New agricultural methods and products Positive | Developing biological weapons Negative New agr... | 1495 1483 | #f3a8ce | 1140 | 1340 | 0.000877 | 0.000746 | 6.546216e-07 |
| 14 | 1510 | 1486 | 1 | Increased Longevity Positive | Reducing food scarcity Positive | Increased Longevity Positive Reducing food sca... | 1510 1486 | #888745 | 1340 | 1260 | 0.000746 | 0.000794 | 5.922767e-07 |
3599 rows × 13 columns
data1 = grnxn['p1p2']
X1, Y1 = calc_curve(data1)
traces = []
traces.append({'x': X1, 'y': Y1, 'name': 'All GR Network'})
plot({'data': traces})
'temp-plot.html'
%matplotlib inline
import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
from scipy.stats._continuous_distns import _distn_names
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams['figure.figsize'] = (16.0, 12.0)
matplotlib.style.use('ggplot')
# Create models from data
def best_fit_distribution(data, bins=200, ax=None):
"""Model data by finding best fit distribution to data"""
# Get histogram of original data
y, x = np.histogram(data, bins=bins, density=True)
x = (x + np.roll(x, -1))[:-1] / 2.0
# Best holders
best_distributions = []
# Estimate distribution parameters from data
for ii, distribution in enumerate([d for d in _distn_names if not d in ['levy_stable', 'studentized_range']]):
print("{:>3} / {:<3}: {}".format( ii+1, len(_distn_names), distribution ))
distribution = getattr(st, distribution)
# Try to fit the distribution
try:
# Ignore warnings from data that can't be fit
with warnings.catch_warnings():
warnings.filterwarnings('ignore')
# fit dist to data
params = distribution.fit(data)
# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]
# Calculate fitted PDF and error with fit in distribution
pdf = distribution.pdf(x, loc=loc, scale=scale, *arg)
sse = np.sum(np.power(y - pdf, 2.0))
# if axis pass in add to plot
try:
if ax:
pd.Series(pdf, x).plot(ax=ax)
end
except Exception:
pass
# identify if this distribution is better
best_distributions.append((distribution, params, sse))
except Exception:
pass
return sorted(best_distributions, key=lambda x:x[2])
def make_pdf(dist, params, size=10000):
"""Generate distributions's Probability Distribution Function """
# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]
# Get sane start and end points of distribution
start = dist.ppf(0.01, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.01, loc=loc, scale=scale)
end = dist.ppf(0.99, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.99, loc=loc, scale=scale)
# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = dist.pdf(x, loc=loc, scale=scale, *arg)
pdf = pd.Series(y, x)
return pdf
# Load data from statsmodels datasets
data = grnxn['p1p2']
# Plot for comparison
plt.figure(figsize=(12,8))
ax = data.plot(kind='hist', bins=50, density=True, alpha=0.5, color=list(matplotlib.rcParams['axes.prop_cycle'])[1]['color'])
# Save plot limits
dataYLim = ax.get_ylim()
# Find best fit distribution
best_distibutions = best_fit_distribution(data, 200, ax)
best_dist = best_distibutions[0]
# Update plots
ax.set_ylim(dataYLim)
ax.set_title(u'Strength of connection.\n All Fitted Distributions')
ax.set_xlabel(u'Strength (P1*P2)')
ax.set_ylabel('Frequency')
# Make PDF with best params
pdf = make_pdf(best_dist[0], best_dist[1])
# Display
plt.figure(figsize=(12,8))
ax = pdf.plot(lw=2, label='PDF', legend=True)
data.plot(kind='hist', bins=50, density=True, alpha=0.5, label='Data', legend=True, ax=ax)
param_names = (best_dist[0].shapes + ', loc, scale').split(', ') if best_dist[0].shapes else ['loc', 'scale']
param_str = ', '.join(['{}={:0.2f}'.format(k,v) for k,v in zip(param_names, best_dist[1])])
dist_str = '{}({})'.format(best_dist[0].name, param_str)
ax.set_title(u'Strength with best fit distribution \n' + dist_str)
ax.set_xlabel(u'Strength (P1*P2')
ax.set_ylabel('Frequency')
1 / 104: ksone 2 / 104: kstwo 3 / 104: kstwobign 4 / 104: norm 5 / 104: alpha 6 / 104: anglit 7 / 104: arcsine 8 / 104: beta 9 / 104: betaprime 10 / 104: bradford 11 / 104: burr 12 / 104: burr12 13 / 104: fisk 14 / 104: cauchy 15 / 104: chi 16 / 104: chi2 17 / 104: cosine 18 / 104: dgamma 19 / 104: dweibull 20 / 104: expon 21 / 104: exponnorm 22 / 104: exponweib 23 / 104: exponpow 24 / 104: fatiguelife 25 / 104: foldcauchy 26 / 104: f 27 / 104: foldnorm 28 / 104: weibull_min 29 / 104: weibull_max 30 / 104: genlogistic 31 / 104: genpareto 32 / 104: genexpon 33 / 104: genextreme 34 / 104: gamma 35 / 104: erlang 36 / 104: gengamma 37 / 104: genhalflogistic 38 / 104: genhyperbolic 39 / 104: gompertz 40 / 104: gumbel_r 41 / 104: gumbel_l 42 / 104: halfcauchy 43 / 104: halflogistic 44 / 104: halfnorm 45 / 104: hypsecant 46 / 104: gausshyper 47 / 104: invgamma 48 / 104: invgauss 49 / 104: geninvgauss 50 / 104: norminvgauss 51 / 104: invweibull 52 / 104: johnsonsb 53 / 104: johnsonsu 54 / 104: laplace 55 / 104: laplace_asymmetric 56 / 104: levy 57 / 104: levy_l 58 / 104: logistic 59 / 104: loggamma 60 / 104: loglaplace 61 / 104: lognorm 62 / 104: gilbrat 63 / 104: maxwell 64 / 104: mielke 65 / 104: kappa4 66 / 104: kappa3 67 / 104: moyal 68 / 104: nakagami 69 / 104: ncx2 70 / 104: ncf 71 / 104: t 72 / 104: nct 73 / 104: pareto 74 / 104: lomax 75 / 104: pearson3 76 / 104: powerlaw 77 / 104: powerlognorm 78 / 104: powernorm 79 / 104: rdist 80 / 104: rayleigh 81 / 104: loguniform 82 / 104: reciprocal 83 / 104: rice 84 / 104: recipinvgauss 85 / 104: semicircular 86 / 104: skewcauchy 87 / 104: skewnorm 88 / 104: trapezoid 89 / 104: trapz 90 / 104: triang 91 / 104: truncexpon 92 / 104: truncnorm 93 / 104: tukeylambda 94 / 104: uniform 95 / 104: vonmises 96 / 104: vonmises_line 97 / 104: wald 98 / 104: wrapcauchy 99 / 104: gennorm 100 / 104: halfgennorm 101 / 104: crystalball 102 / 104: argus
Text(0, 0.5, 'Frequency')
from pyvis.network import Network
from itertools import combinations
import networkx
import nxviz as nv
import matplotlib as mpl
mpl.style.use('classic')
df_graph = grnxn[grnxn['counts'] > 5]
df_graph['From'] = df_graph[3].map(str)+' '+ df_graph['counts'].map(str)
df_graph['To'] = df_graph[4]
df_graph['Count'] = df_graph['counts']
colors=cls['colour']
weights = df_graph['counts']
G = networkx.from_pandas_edgelist(
df_graph, source="From", target="To", edge_attr="Count"
)
#################3333333
# dynamic node sizes
scale=1 # Scaling the size of the nodes by 10*degree
d = dict(G.degree)
#Updating dict
d.update((x, scale*y) for x, y in d.items())
####
plt.figure(figsize=(40,40))
plt.rcParams['figure.facecolor'] = 'white'
G = networkx.draw_networkx(G, edge_color=colors, node_color='blue',alpha=1, node_size=100,
width=weights*0.1, arrows= False, with_labels=True, font_size=6, font_family='sans-serif'
)
plt.tight_layout()
plt.savefig('all_combined_complete_GR.png', dpi=500)
from pyvis.network import Network
import pandas as pd
got_net = Network(height='1080px', width='100%', bgcolor='#ffffff', font_color='black', directed=False)
# set the physics layout of the network
# got_net.barnes_hut()
got_data = grnxn
got_data = got_data[got_data['p1p2'] >= 0.2]
sources = got_data[3]
targets = got_data[4]
weights_edges = got_data['p1p2'].round(3)
weights_n1 = got_data['p1'].round(3)
weights_n2 = got_data['p2'].round(3)
colours = got_data['colour']
edge_data = zip(sources, targets, weights_edges, weights_n1, weights_n2, colours)
for e in edge_data:
src = e[0]
dst = e[1]
we = e[2]
wn1 = e[3]
wn2 = e[4]
c = e[5]
got_net.add_node(src, src, title=src, value=wn1, color=c)
got_net.add_node(dst, dst, title=dst, value=wn2, color=c)
got_net.add_edge(src, dst, label=we, value=we, color=c)
neighbor_map = got_net.get_adj_list()
edges = got_net.get_edges()
nodes=got_net.get_nodes()
N_nodes=len(nodes)
N_edges=len(edges)
weights=[[] for i in range(N_nodes)]
#Associating weights to neighbors
for i in range(N_nodes): #Loop through nodes
for neighbor in neighbor_map[nodes[i]]: #and neighbors
for j in range(N_edges): #associate weights to the edge between node and neighbor
if (edges[j]['from']==nodes[i] and edges[j]['to']==neighbor) or \
(edges[j]['from']==neighbor and edges[j]['to']==nodes[i]):
weights[i].append(edges[j]['value'])
for node,i in zip(got_net.nodes,range(N_nodes)):
node['value']=len(neighbor_map[node['id']])
node['weight']=[str(weights[i][k]) for k in range(len(weights[i]))]
list_neighbor=list(neighbor_map[node['id']])
#Concatenating neighbors and weights
hover_str=[list_neighbor[k]+' '+ node['weight'][k] for k in range(node['value'])]
#Setting up node title for hovering
node['title']+=' Neighbors:<br>'+'<br>'.join(hover_str)
got_net.show_buttons(filter_=['physics'])
got_net.show('all_combined_complete_GR.html')
I have put a termination cell before this section to prevent the remaining cells to run.
all_GR_BN.columns
Index(['id', 'Variable', 'Description', 'Option', 'Group'], dtype='object')
ci_df = gk_df[['id', 'Variable', 'Description', 'Option', 'Group']]
category_frame = ci_df
ci_df.Group.unique()
# 7 items are present!
array([58, 60, 59, 61, 62, 63, 64])
new_BN_df = pd.DataFrame()
new_BN_df = pd.concat([category_frame, all_GR_BN], axis=0)
new_BN_df
| id | Variable | Description | Option | Group | |
|---|---|---|---|---|---|
| 0 | 0 | LE5.012 | What is a genome? | All the genes in the DNA | 58 |
| 1 | 1 | LE5.012 | What is a genome? | Correct - The entire sequence of DNA of an ind... | 58 |
| 3 | 3 | LE5.012 | What is a genome? | All the genes in the DNA | 58 |
| 5 | 5 | LE5.012 | What is a genome? | All the genes in the DNA | 58 |
| 7 | 14 | LE5.012 | What is a genome? | All the genes in the DNA | 58 |
| ... | ... | ... | ... | ... | ... |
| 19046 | 1875 | LE3.201 | Revising and updating | Strongly agree to Revising and Updating | 5 |
| 19047 | 1885 | LE3.201 | Revising and updating | Agree to Revising and Updating | 5 |
| 19048 | 1886 | LE3.201 | Revising and updating | Strongly agree to Revising and Updating | 5 |
| 19049 | 1887 | LE3.201 | Revising and updating | Strongly agree to Revising and Updating | 5 |
| 19050 | 1888 | LE3.201 | Revising and updating | Strongly agree to Revising and Updating | 5 |
50280 rows × 5 columns
all_GR_BN.shape
(43381, 5)
new_BN_df.Group.unique()
array([58, 60, 59, 61, 62, 63, 64, 78, 79, 80, 81, 82, 83, 84, 85, 86,
'8', '9', '24', '27', '3', '4', '5'], dtype=object)
megadf.Group.unique()
array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11'],
dtype=object)
megadf['Group'].replace('8', '99', inplace=True)
select = ['1', '2', '99']
new_df = megadf[megadf['Group'].isin(select)]
new_df
| id | Description | Option | Variable | Group | |
|---|---|---|---|---|---|
| 6957 | 0 | In most instances, and assuming two parents wi... | Two legal guardians need to agree | LE3.087 | 1 |
| 6958 | 1 | In most instances, and assuming two parents wi... | Do not know | LE3.087 | 1 |
| 6959 | 3 | In most instances, and assuming two parents wi... | Two legal guardians need to agree | LE3.087 | 1 |
| 6960 | 5 | In most instances, and assuming two parents wi... | Two legal guardians need to agree | LE3.087 | 1 |
| 6961 | 14 | In most instances, and assuming two parents wi... | Two legal guardians need to agree | LE3.087 | 1 |
| ... | ... | ... | ... | ... | ... |
| 19639 | 1020 | Have you ever had genetic testing and why? | Other - Recommended by doctor | LE2.022 | 99 |
| 19640 | 1020 | Have you ever had genetic testing and why? | No | LE2.024 | 99 |
| 19641 | 1091 | Have you ever had genetic testing and why? | Other - Recommended by doctor | LE2.022 | 99 |
| 19642 | 1091 | Have you ever had genetic testing and why? | No | LE2.024 | 99 |
| 19643 | 1346 | Have you ever had genetic testing and why? | Other - Recommended by doctor | LE2.022 | 99 |
6453 rows × 5 columns
new_BN_df = pd.concat([new_BN_df, new_df], axis=0)
new_BN_df.Description.unique()
array(['What is a genome?',
'Which of the following 4 letter groups represent the ',
'On average, how much of their total DNA is the same in two people selected at random?',
'Genetic contribution to the risk of developing Schizophernia comes from -',
'The DNA sequence in two different cells, for example a neuron and a heart cell, of one person, is -',
'Some of the genes that relate to dyslexia also relate to ADHD -',
'If a report states ‘the heritability of insomnia is approximately 30 percent what would that mean?',
'GK Score', 'Gender', 'Age', 'Confidence in GK',
'Related/ Not related to law', 'Students/ Non Students',
'Law or Non Law Students and Non Students', 'Concern',
'Genetic Curiosity',
'Please indicate whether the following endeavours have positive negative or no impact on society',
'Genetic science can contribute to the following social changes. Indicate whether you consider these endeavours positive neutral or negative for society ',
'Would you be interested in finding out about genetic information',
'What concerns do participants have in relation to genetic testing',
'Dissemination of GK', 'Policymaking', 'Revising and updating',
'In most instances, and assuming two parents will be involved in raising a child, who should decide on sequencing a child’s genome at birth?',
'Have you ever had genetic testing and why?'], dtype=object)
new_BN_df.Group.unique()
array([58, 60, 59, 61, 62, 63, 64, 78, 79, 80, 81, 82, 83, 84, 85, 86,
'8', '9', '24', '27', '3', '4', '5', '1', '2', '99'], dtype=object)
len_options = len(new_BN_df.Description.unique())
ranges = list(range(100, 100+len_options))
len(ranges) == len(new_BN_df.Description.unique())
options = new_BN_df.Description.unique()
categories = dict(zip(options,ranges))
categories
{'What is a genome?': 100,
'Which of the following 4 letter groups represent the ': 101,
'On average, how much of their total DNA is the same in two people selected at random?': 102,
'Genetic contribution to the risk of developing Schizophernia comes from -': 103,
'The DNA sequence in two different cells, for example a neuron and a heart cell, of one person, is -': 104,
'Some of the genes that relate to dyslexia also relate to ADHD -': 105,
'If a report states ‘the heritability of insomnia is approximately 30 percent what would that mean?': 106,
'GK Score': 107,
'Gender': 108,
'Age': 109,
'Confidence in GK': 110,
'Related/ Not related to law': 111,
'Students/ Non Students': 112,
'Law or Non Law Students and Non Students': 113,
'Concern': 114,
'Genetic Curiosity': 115,
'Please indicate whether the following endeavours have positive negative or no impact on society': 116,
'Genetic science can contribute to the following social changes. Indicate whether you consider these endeavours positive neutral or negative for society ': 117,
'Would you be interested in finding out about genetic information': 118,
'What concerns do participants have in relation to genetic testing': 119,
'Dissemination of GK': 120,
'Policymaking': 121,
'Revising and updating': 122,
'In most instances, and assuming two parents will be involved in raising a child, who should decide on sequencing a child’s genome at birth?': 123,
'Have you ever had genetic testing and why?': 124}
new_BN_df['Group'] = new_BN_df['Description']
new_BN_df['Group'] = new_BN_df['Group'].map(categories)
new_BN_df
| id | Variable | Description | Option | Group | |
|---|---|---|---|---|---|
| 0 | 0 | LE5.012 | What is a genome? | All the genes in the DNA | 100 |
| 1 | 1 | LE5.012 | What is a genome? | Correct - The entire sequence of DNA of an ind... | 100 |
| 3 | 3 | LE5.012 | What is a genome? | All the genes in the DNA | 100 |
| 5 | 5 | LE5.012 | What is a genome? | All the genes in the DNA | 100 |
| 7 | 14 | LE5.012 | What is a genome? | All the genes in the DNA | 100 |
| ... | ... | ... | ... | ... | ... |
| 19639 | 1020 | LE2.022 | Have you ever had genetic testing and why? | Other - Recommended by doctor | 124 |
| 19640 | 1020 | LE2.024 | Have you ever had genetic testing and why? | No | 124 |
| 19641 | 1091 | LE2.022 | Have you ever had genetic testing and why? | Other - Recommended by doctor | 124 |
| 19642 | 1091 | LE2.024 | Have you ever had genetic testing and why? | No | 124 |
| 19643 | 1346 | LE2.022 | Have you ever had genetic testing and why? | Other - Recommended by doctor | 124 |
56733 rows × 5 columns
new_BN_df.Group.unique()
array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124])
metadata_all_GR_variables = new_BN_df[['Variable', 'Description', 'Option', 'Option', 'Group']]
metadata_all_GR_variables= metadata_all_GR_variables.drop_duplicates(subset='Option', keep="last")
metadata_all_GR_variables.to_csv('metadata_all_GR_variables.tsv', index=False, sep='\t')
new_BN_df.to_csv('all_GR_variables.tsv', index=False, sep='\t')
metadata_all_GR_variables.Group.unique()
array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124])
categories
{'What is a genome?': 100,
'Which of the following 4 letter groups represent the ': 101,
'On average, how much of their total DNA is the same in two people selected at random?': 102,
'Genetic contribution to the risk of developing Schizophernia comes from -': 103,
'The DNA sequence in two different cells, for example a neuron and a heart cell, of one person, is -': 104,
'Some of the genes that relate to dyslexia also relate to ADHD -': 105,
'If a report states ‘the heritability of insomnia is approximately 30 percent what would that mean?': 106,
'GK Score': 107,
'Gender': 108,
'Age': 109,
'Confidence in GK': 110,
'Related/ Not related to law': 111,
'Students/ Non Students': 112,
'Law or Non Law Students and Non Students': 113,
'Concern': 114,
'Genetic Curiosity': 115,
'Please indicate whether the following endeavours have positive negative or no impact on society': 116,
'Genetic science can contribute to the following social changes. Indicate whether you consider these endeavours positive neutral or negative for society ': 117,
'Would you be interested in finding out about genetic information': 118,
'What concerns do participants have in relation to genetic testing': 119,
'Dissemination of GK': 120,
'Policymaking': 121,
'Revising and updating': 122,
'In most instances, and assuming two parents will be involved in raising a child, who should decide on sequencing a child’s genome at birth?': 123,
'Have you ever had genetic testing and why?': 124}
groups_not_to_keep = ['111', '117', '118', '124']
new_BN_df.Group = new_BN_df.Group.map(str)
all_GR_BN = new_BN_df[~new_BN_df['Group'].isin(groups_not_to_keep)]
all_GR_BN = all_GR_BN.reset_index()
del all_GR_BN['index']
import itertools
from itertools import permutations
iterable = all_GR_BN.Group.unique()
all_select = list(itertools.permutations(iterable, 2))
len(all_select)
420
should take 15 minutes
grnxn = pd.DataFrame()
item_len = list()
import random
%%time
number = 0
for item in all_select:
select= list(item)
nndf = all_GR_BN[all_GR_BN['Group'].isin(select)]
#cps['Option'] = cps['Option']+' '+cps['Description']
sources = nndf[['id', 'Option']].copy()
len_options = len(nndf.Option.unique())
len_options
len_ids = len(nndf.id.unique()) +1
len_ids
ranges = list(range(len_ids, len_ids+len_options))
len(ranges) == len(nndf.Option.unique())
options = nndf.Option.unique()
options
# get categorical codes
categories = dict(zip(options,ranges))
categories
sources['codes'] = sources['Option'].map(categories)
xtt=pd.DataFrame()
xtt = sources[['Option', 'codes']].copy()
# get source codes and counts
sources['codes'] = sources['codes'].map(str)
counts = sources.groupby(["id"])["codes"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+counts['codes'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
counts['xcodes'] = nx.iloc[:,2]
gcounts = sources.groupby(["id"])["Option"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+gcounts['Option'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
gcounts['xoption'] = nx.iloc[:,2]
gcounts
lel = pd.merge(counts, gcounts, on='id')
del lel['codes']
del lel['Option']
lel
# writing operations
wo = []
for i in range(len(counts['xcodes'])) :
wo.append(pd.Series(counts.iloc[i, 2]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
# value counts df
vc = pd.DataFrame(wo)
# counts
cxounts = pd.concat([lel, vc], axis=1)
lex = cxounts.set_index(['id','xcodes', 'xoption']).stack().reset_index()
lex['counts'] = lex[0]
lex['codes'] = lex['level_3']
del lex[0]
del lex['level_3']
# paths
lex['path'] = """'""" + lex["id"].astype(str)+"',"+lex["xcodes"]
lex['label'] = """'""" + lex["id"].astype(str)+"',"+lex["xoption"]
lex['path'] = lex['path'].str.replace("""'""", '')
lex['label'] = lex['label'].str.replace("""'""", '')
lex.head(2)
lex["counts"] = lex["counts"].map(int)
## paths and sources
path_list = list(lex.path.unique())
label_list = list(lex.xoption.unique())
############################################## corrected code
def zigzag(seq):
"""Return two sequences with alternating elements from `seq`"""
seq_int = [list(map(int, x)) for x in seq]
x = []
y = []
for i in seq_int:
for j, k in zip(i, i[1:]):
x.append(j)
y.append(k)
return list(zip(x, y))
# get a path graph
y = []
for i in range(len(path_list)):
y.append(list(path_list[i].split(',')))
big_list = zigzag(y)
#### MOST COMMON PATH
c_path = pd.DataFrame(big_list)
c_path = c_path[c_path[0].isin(ranges)] #remove the participant id initials
c_path[2] = c_path[0]
c_path[0] = '1'
c_path
########################## edit here
tagged = c_path.groupby([1, 2])[0].agg(lambda x: """','""".join(x[x != ''])).reset_index()
xtagged= ("""'"""+tagged[0].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
xtagged['counts'] = [len(x.split(',')) for x in xtagged[0].tolist()]
ztagged = pd.concat([tagged, xtagged], axis=1)
ztagged
####
inv_map = {str(v): str(k) for k, v in categories.items()}
fif = ztagged[[1, 2, 0, 'counts']]
fif[1] = fif[1].map(str)
fif[3] = fif[1].map(inv_map)
fif[2] = fif[2].map(str)
fif[4] = fif[2].map(inv_map)
del fif[0]
fif['label'] = fif[3] + ' ' + fif[4]
fif[1] = fif[1].map(int)
fif[2] = fif[2].map(int)
fif
fif['connections'] = fif.iloc[:,0].astype(str)+" "+fif.iloc[:,1].astype(str)
cls = pd.DataFrame()
cls['connections'] = pd.DataFrame(fif['connections'].unique())
import random
# generate random colours
amount = len(fif['connections'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
cls['colour'] = colour
fif = pd.merge(fif, cls, on='connections')
fif
def nodify(node_names):
node_names = unique_list
# uniqe name begginings
ends = sorted(list(set([e[0] for e in node_names])))
# intervals
steps = 1/4
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
# x and y values in list form
x_values = [nodes_x[n[0]] for n in node_names]
y_values = [x*0.03 for x in range(1, len(x_values))]
return x_values, y_values
# map colours to categories
import random
# generate random colours
amount = len(npaths['name'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
####### GET SOME SIGNIFICANT PATHS, options occuring together
nfif = fif[fif['counts'] > 0]
nfif
pax = pd.DataFrame(nndf).reset_index()
pax.id = 1
pax.drop('index', axis=1, inplace=True)
pax = pax.groupby('Option')['id'].sum().reset_index()
pax.columns = [3, 'id']
nxn = pd.merge(nfif, pax, on=3)
pax.columns = [4, 'idx']
rnxn = pd.merge(nxn, pax, on=4)
rnxn['p1'] = rnxn['counts']/rnxn['id']
rnxn['p2'] = rnxn['counts']/rnxn['idx']
rnxn['p1p2'] = rnxn['p1']*rnxn['p2']
#rnxn = rnxn[rnxn['p1p2'] >= .05]
rnxn.sort_values(['p1p2'], ascending=False, inplace=True)
grnxn = pd.concat([grnxn, rnxn], axis=0)
CPU times: user 14min 40s, sys: 460 ms, total: 14min 40s Wall time: 14min 41s
grnxn = grnxn.sort_values(['counts', 'p1p2'], ascending=[False, False]).drop_duplicates(subset=[3, 4], keep='last')
grnxn
| 1 | 2 | counts | 3 | 4 | label | connections | colour | id | idx | p1 | p2 | p1p2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 776 | 775 | 533 | Students | Younger Participants | Students Younger Participants | 776 775 | #e27328 | 561 | 599 | 0.950089 | 0.889816 | 0.845405 |
| 2 | 1007 | 1002 | 451 | Correct – Many genes | Correct - GCTA | Correct – Many genes Correct - GCTA | 1007 1002 | #571720 | 658 | 651 | 0.685410 | 0.692780 | 0.474839 |
| 6 | 1000 | 999 | 450 | Correct - True | Correct – Many genes | Correct - True Correct – Many genes | 1000 999 | #811bd7 | 644 | 658 | 0.698758 | 0.683891 | 0.477874 |
| 1 | 991 | 987 | 444 | Correct - True | Correct - GCTA | Correct - True Correct - GCTA | 991 987 | #34180a | 644 | 651 | 0.689441 | 0.682028 | 0.470218 |
| 1 | 1021 | 1017 | 428 | Correct - GCTA | All the genes in the DNA | Correct - GCTA All the genes in the DNA | 1021 1017 | #f13fde | 651 | 641 | 0.657450 | 0.667707 | 0.438984 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 15 | 1017 | 1021 | 1 | All the genes in the DNA | Correct - GCTA | All the genes in the DNA Correct - GCTA | 1017 1021 | #80c9bb | 641 | 651 | 0.001560 | 0.001536 | 0.000002 |
| 12 | 1016 | 1021 | 1 | All the genes in the DNA | Correct – Many genes | All the genes in the DNA Correct – Many genes | 1016 1021 | #01f986 | 641 | 658 | 0.001560 | 0.001520 | 0.000002 |
| 0 | 1001 | 1001 | 1 | Correct - GCTA | Correct - GCTA | Correct - GCTA Correct - GCTA | 1001 1001 | #3595c0 | 651 | 651 | 0.001536 | 0.001536 | 0.000002 |
| 13 | 1002 | 1007 | 1 | Correct - GCTA | Correct – Many genes | Correct - GCTA Correct – Many genes | 1002 1007 | #b93fab | 651 | 658 | 0.001536 | 0.001520 | 0.000002 |
| 8 | 1008 | 1008 | 1 | Correct – Many genes | Correct – Many genes | Correct – Many genes Correct – Many genes | 1008 1008 | #1cf559 | 658 | 658 | 0.001520 | 0.001520 | 0.000002 |
3974 rows × 13 columns
grnxn.to_csv('GK_and_other_network.tsv', index=False, sep='\t')
data1 = grnxn['p1p2']
X1, Y1 = calc_curve(data1)
traces = []
traces.append({'x': X1, 'y': Y1, 'name': 'GK and Other Variables'})
plot({'data': traces})
'temp-plot.html'
%matplotlib inline
import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
from scipy.stats._continuous_distns import _distn_names
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams['figure.figsize'] = (16.0, 12.0)
matplotlib.style.use('ggplot')
# Create models from data
def best_fit_distribution(data, bins=200, ax=None):
"""Model data by finding best fit distribution to data"""
# Get histogram of original data
y, x = np.histogram(data, bins=bins, density=True)
x = (x + np.roll(x, -1))[:-1] / 2.0
# Best holders
best_distributions = []
# Estimate distribution parameters from data
for ii, distribution in enumerate([d for d in _distn_names if not d in ['levy_stable', 'studentized_range']]):
print("{:>3} / {:<3}: {}".format( ii+1, len(_distn_names), distribution ))
distribution = getattr(st, distribution)
# Try to fit the distribution
try:
# Ignore warnings from data that can't be fit
with warnings.catch_warnings():
warnings.filterwarnings('ignore')
# fit dist to data
params = distribution.fit(data)
# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]
# Calculate fitted PDF and error with fit in distribution
pdf = distribution.pdf(x, loc=loc, scale=scale, *arg)
sse = np.sum(np.power(y - pdf, 2.0))
# if axis pass in add to plot
try:
if ax:
pd.Series(pdf, x).plot(ax=ax)
end
except Exception:
pass
# identify if this distribution is better
best_distributions.append((distribution, params, sse))
except Exception:
pass
return sorted(best_distributions, key=lambda x:x[2])
def make_pdf(dist, params, size=10000):
"""Generate distributions's Probability Distribution Function """
# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]
# Get sane start and end points of distribution
start = dist.ppf(0.01, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.01, loc=loc, scale=scale)
end = dist.ppf(0.99, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.99, loc=loc, scale=scale)
# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = dist.pdf(x, loc=loc, scale=scale, *arg)
pdf = pd.Series(y, x)
return pdf
# Load data from statsmodels datasets
data = grnxn['p1p2']
# Plot for comparison
plt.figure(figsize=(12,8))
ax = data.plot(kind='hist', bins=50, density=True, alpha=0.5, color=list(matplotlib.rcParams['axes.prop_cycle'])[1]['color'])
# Save plot limits
dataYLim = ax.get_ylim()
# Find best fit distribution
best_distibutions = best_fit_distribution(data, 200, ax)
best_dist = best_distibutions[0]
# Update plots
ax.set_ylim(dataYLim)
ax.set_title(u'Strength of connection.\n All Fitted Distributions')
ax.set_xlabel(u'Strength (P1*P2)')
ax.set_ylabel('Frequency')
# Make PDF with best params
pdf = make_pdf(best_dist[0], best_dist[1])
# Display
plt.figure(figsize=(12,8))
ax = pdf.plot(lw=2, label='PDF', legend=True)
data.plot(kind='hist', bins=50, density=True, alpha=0.5, label='Data', legend=True, ax=ax)
param_names = (best_dist[0].shapes + ', loc, scale').split(', ') if best_dist[0].shapes else ['loc', 'scale']
param_str = ', '.join(['{}={:0.2f}'.format(k,v) for k,v in zip(param_names, best_dist[1])])
dist_str = '{}({})'.format(best_dist[0].name, param_str)
ax.set_title(u'Strength with best fit distribution \n' + dist_str)
ax.set_xlabel(u'Strength (P1*P2')
ax.set_ylabel('Frequency')
1 / 104: ksone 2 / 104: kstwo 3 / 104: kstwobign 4 / 104: norm 5 / 104: alpha 6 / 104: anglit 7 / 104: arcsine 8 / 104: beta 9 / 104: betaprime 10 / 104: bradford 11 / 104: burr 12 / 104: burr12 13 / 104: fisk 14 / 104: cauchy 15 / 104: chi 16 / 104: chi2 17 / 104: cosine 18 / 104: dgamma 19 / 104: dweibull 20 / 104: expon 21 / 104: exponnorm 22 / 104: exponweib 23 / 104: exponpow 24 / 104: fatiguelife 25 / 104: foldcauchy 26 / 104: f 27 / 104: foldnorm 28 / 104: weibull_min 29 / 104: weibull_max 30 / 104: genlogistic 31 / 104: genpareto 32 / 104: genexpon 33 / 104: genextreme 34 / 104: gamma 35 / 104: erlang 36 / 104: gengamma 37 / 104: genhalflogistic 38 / 104: genhyperbolic 39 / 104: gompertz 40 / 104: gumbel_r 41 / 104: gumbel_l 42 / 104: halfcauchy 43 / 104: halflogistic 44 / 104: halfnorm 45 / 104: hypsecant 46 / 104: gausshyper 47 / 104: invgamma 48 / 104: invgauss 49 / 104: geninvgauss 50 / 104: norminvgauss 51 / 104: invweibull 52 / 104: johnsonsb 53 / 104: johnsonsu 54 / 104: laplace 55 / 104: laplace_asymmetric 56 / 104: levy 57 / 104: levy_l 58 / 104: logistic 59 / 104: loggamma 60 / 104: loglaplace 61 / 104: lognorm 62 / 104: gilbrat 63 / 104: maxwell 64 / 104: mielke 65 / 104: kappa4 66 / 104: kappa3 67 / 104: moyal 68 / 104: nakagami 69 / 104: ncx2 70 / 104: ncf 71 / 104: t 72 / 104: nct 73 / 104: pareto 74 / 104: lomax 75 / 104: pearson3 76 / 104: powerlaw 77 / 104: powerlognorm 78 / 104: powernorm 79 / 104: rdist 80 / 104: rayleigh 81 / 104: loguniform 82 / 104: reciprocal 83 / 104: rice 84 / 104: recipinvgauss 85 / 104: semicircular 86 / 104: skewcauchy 87 / 104: skewnorm 88 / 104: trapezoid 89 / 104: trapz 90 / 104: triang 91 / 104: truncexpon 92 / 104: truncnorm 93 / 104: tukeylambda 94 / 104: uniform 95 / 104: vonmises 96 / 104: vonmises_line 97 / 104: wald 98 / 104: wrapcauchy 99 / 104: gennorm 100 / 104: halfgennorm 101 / 104: crystalball 102 / 104: argus
Text(0, 0.5, 'Frequency')
from pyvis.network import Network
import pandas as pd
got_net = Network(height='1080px', width='100%', bgcolor='#ffffff', font_color='black', directed=False)
# set the physics layout of the network
# got_net.barnes_hut()
got_data = grnxn
got_data = got_data[got_data['p1p2'] >= 0.2]
sources = got_data[3]
targets = got_data[4]
weights_edges = got_data['p1p2'].round(3)
weights_n1 = got_data['p1'].round(3)
weights_n2 = got_data['p2'].round(3)
colours = got_data['colour']
edge_data = zip(sources, targets, weights_edges, weights_n1, weights_n2, colours)
for e in edge_data:
src = e[0]
dst = e[1]
we = e[2]
wn1 = e[3]
wn2 = e[4]
c = e[5]
got_net.add_node(src, src, title=src, value=wn1, color=c)
got_net.add_node(dst, dst, title=dst, value=wn2, color=c)
got_net.add_edge(src, dst, label=we, value=we, color=c)
neighbor_map = got_net.get_adj_list()
edges = got_net.get_edges()
nodes=got_net.get_nodes()
N_nodes=len(nodes)
N_edges=len(edges)
weights=[[] for i in range(N_nodes)]
#Associating weights to neighbors
for i in range(N_nodes): #Loop through nodes
for neighbor in neighbor_map[nodes[i]]: #and neighbors
for j in range(N_edges): #associate weights to the edge between node and neighbor
if (edges[j]['from']==nodes[i] and edges[j]['to']==neighbor) or \
(edges[j]['from']==neighbor and edges[j]['to']==nodes[i]):
weights[i].append(edges[j]['value'])
for node,i in zip(got_net.nodes,range(N_nodes)):
node['value']=len(neighbor_map[node['id']])
node['weight']=[str(weights[i][k]) for k in range(len(weights[i]))]
list_neighbor=list(neighbor_map[node['id']])
#Concatenating neighbors and weights
hover_str=[list_neighbor[k]+' '+ node['weight'][k] for k in range(node['value'])]
#Setting up node title for hovering
node['title']+=' Neighbors:<br>'+'<br>'.join(hover_str)
got_net.show_buttons(filter_=['physics'])
got_net.show('GK_and_other_variables_GR.html')
groups_not_to_keep = ['111', '117', '118', '124', '116', '100', '101', '103', '104', '105', '106']
new_BN_df.Group = new_BN_df.Group.map(str)
all_GR_BN = new_BN_df[~new_BN_df['Group'].isin(groups_not_to_keep)]
all_GR_BN = all_GR_BN.reset_index()
import itertools
from itertools import permutations
iterable = all_GR_BN.Group.unique()
all_select = list(itertools.permutations(iterable, 2))
len(all_select)
182
grnxn = pd.DataFrame()
item_len = list()
import random
%%time
number = 0
for item in all_select:
select= list(item)
nndf = all_GR_BN[all_GR_BN['Group'].isin(select)]
#cps['Option'] = cps['Option']+' '+cps['Description']
sources = nndf[['id', 'Option']].copy()
len_options = len(nndf.Option.unique())
len_options
len_ids = len(nndf.id.unique()) +1
len_ids
ranges = list(range(len_ids, len_ids+len_options))
len(ranges) == len(nndf.Option.unique())
options = nndf.Option.unique()
options
# get categorical codes
categories = dict(zip(options,ranges))
categories
sources['codes'] = sources['Option'].map(categories)
xtt=pd.DataFrame()
xtt = sources[['Option', 'codes']].copy()
# get source codes and counts
sources['codes'] = sources['codes'].map(str)
counts = sources.groupby(["id"])["codes"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+counts['codes'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
counts['xcodes'] = nx.iloc[:,2]
gcounts = sources.groupby(["id"])["Option"].agg(lambda x: """','""".join(x[x != ''])).reset_index()
nx = ("""'"""+gcounts['Option'].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
gcounts['xoption'] = nx.iloc[:,2]
gcounts
lel = pd.merge(counts, gcounts, on='id')
del lel['codes']
del lel['Option']
lel
# writing operations
wo = []
for i in range(len(counts['xcodes'])) :
wo.append(pd.Series(counts.iloc[i, 2]).apply(ast.literal_eval).apply(lambda x: pd.Series(x)).stack().value_counts())
# value counts df
vc = pd.DataFrame(wo)
# counts
cxounts = pd.concat([lel, vc], axis=1)
lex = cxounts.set_index(['id','xcodes', 'xoption']).stack().reset_index()
lex['counts'] = lex[0]
lex['codes'] = lex['level_3']
del lex[0]
del lex['level_3']
# paths
lex['path'] = """'""" + lex["id"].astype(str)+"',"+lex["xcodes"]
lex['label'] = """'""" + lex["id"].astype(str)+"',"+lex["xoption"]
lex['path'] = lex['path'].str.replace("""'""", '')
lex['label'] = lex['label'].str.replace("""'""", '')
lex.head(2)
lex["counts"] = lex["counts"].map(int)
## paths and sources
path_list = list(lex.path.unique())
label_list = list(lex.xoption.unique())
############################################## corrected code
def zigzag(seq):
"""Return two sequences with alternating elements from `seq`"""
seq_int = [list(map(int, x)) for x in seq]
x = []
y = []
for i in seq_int:
for j, k in zip(i, i[1:]):
x.append(j)
y.append(k)
return list(zip(x, y))
# get a path graph
y = []
for i in range(len(path_list)):
y.append(list(path_list[i].split(',')))
big_list = zigzag(y)
#### MOST COMMON PATH
c_path = pd.DataFrame(big_list)
c_path = c_path[c_path[0].isin(ranges)] #remove the participant id initials
c_path[2] = c_path[0]
c_path[0] = '1'
c_path
########################## edit here
tagged = c_path.groupby([1, 2])[0].agg(lambda x: """','""".join(x[x != ''])).reset_index()
xtagged= ("""'"""+tagged[0].astype(str)+"""'""").apply(lambda x: pd.Series(x)).stack().reset_index() # convert string to series
xtagged['counts'] = [len(x.split(',')) for x in xtagged[0].tolist()]
ztagged = pd.concat([tagged, xtagged], axis=1)
ztagged
####
inv_map = {str(v): str(k) for k, v in categories.items()}
fif = ztagged[[1, 2, 0, 'counts']]
fif[1] = fif[1].map(str)
fif[3] = fif[1].map(inv_map)
fif[2] = fif[2].map(str)
fif[4] = fif[2].map(inv_map)
del fif[0]
fif['label'] = fif[3] + ' ' + fif[4]
fif[1] = fif[1].map(int)
fif[2] = fif[2].map(int)
fif
fif['connections'] = fif.iloc[:,0].astype(str)+" "+fif.iloc[:,1].astype(str)
cls = pd.DataFrame()
cls['connections'] = pd.DataFrame(fif['connections'].unique())
import random
# generate random colours
amount = len(fif['connections'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
cls['colour'] = colour
fif = pd.merge(fif, cls, on='connections')
fif
def nodify(node_names):
node_names = unique_list
# uniqe name begginings
ends = sorted(list(set([e[0] for e in node_names])))
# intervals
steps = 1/4
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
# x and y values in list form
x_values = [nodes_x[n[0]] for n in node_names]
y_values = [x*0.03 for x in range(1, len(x_values))]
return x_values, y_values
# map colours to categories
import random
# generate random colours
amount = len(npaths['name'].unique())
colour = []
for i in range(0, amount):
colour.append("#%06x" % random.randint(i, 0xFFFFFF))
####### GET SOME SIGNIFICANT PATHS, options occuring together
nfif = fif[fif['counts'] > 0]
nfif
pax = pd.DataFrame(nndf).reset_index()
pax.id = 1
pax.drop('index', axis=1, inplace=True)
pax = pax.groupby('Option')['id'].sum().reset_index()
pax.columns = [3, 'id']
nxn = pd.merge(nfif, pax, on=3)
pax.columns = [4, 'idx']
rnxn = pd.merge(nxn, pax, on=4)
rnxn['p1'] = rnxn['counts']/rnxn['id']
rnxn['p2'] = rnxn['counts']/rnxn['idx']
rnxn['p1p2'] = rnxn['p1']*rnxn['p2']
#rnxn = rnxn[rnxn['p1p2'] >= .05]
rnxn.sort_values(['p1p2'], ascending=False, inplace=True)
grnxn = pd.concat([grnxn, rnxn], axis=0)
CPU times: user 5min 29s, sys: 223 ms, total: 5min 29s Wall time: 5min 29s
grnxn = grnxn.sort_values(['counts', 'p1p2'], ascending=[False, False]).drop_duplicates(subset=[3, 4], keep='last')
grnxn
| 1 | 2 | counts | 3 | 4 | label | connections | colour | id | idx | p1 | p2 | p1p2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 776 | 775 | 533 | Students | Younger Participants | Students Younger Participants | 776 775 | #7f1625 | 561 | 599 | 0.950089 | 0.889816 | 0.845405 |
| 4 | 776 | 775 | 409 | Low GK Confidence | Younger Participants | Low GK Confidence Younger Participants | 776 775 | #6116bf | 519 | 599 | 0.788054 | 0.682805 | 0.538087 |
| 3 | 777 | 774 | 399 | Younger Participants | Low GK Score | Younger Participants Low GK Score | 777 774 | #ecb50b | 599 | 496 | 0.666110 | 0.804435 | 0.535843 |
| 2 | 776 | 774 | 382 | Students | Low GK Confidence | Students Low GK Confidence | 776 774 | #7b32f3 | 561 | 519 | 0.680927 | 0.736031 | 0.501183 |
| 2 | 776 | 774 | 381 | Students | Low GK Score | Students Low GK Score | 776 774 | #300da3 | 561 | 496 | 0.679144 | 0.768145 | 0.521681 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 26 | 774 | 783 | 1 | Students | I am concerned my data will be used for other ... | Students I am concerned my data will be used f... | 774 783 | #d0ddce | 561 | 498 | 0.001783 | 0.002008 | 0.000004 |
| 0 | 774 | 776 | 1 | Low GK Confidence | Students | Low GK Confidence Students | 774 776 | #2e79d9 | 519 | 561 | 0.001927 | 0.001783 | 0.000003 |
| 26 | 775 | 783 | 1 | Younger Participants | I am concerned my data will be used for other ... | Younger Participants I am concerned my data wi... | 775 783 | #449b3d | 599 | 498 | 0.001669 | 0.002008 | 0.000003 |
| 0 | 775 | 776 | 1 | Younger Participants | Low GK Confidence | Younger Participants Low GK Confidence | 775 776 | #18a1cd | 599 | 519 | 0.001669 | 0.001927 | 0.000003 |
| 0 | 775 | 776 | 1 | Younger Participants | Students | Younger Participants Students | 775 776 | #e51bd8 | 599 | 561 | 0.001669 | 0.001783 | 0.000003 |
1547 rows × 13 columns
grnxn.to_csv('Thesis_variables.tsv', index=False, sep='\t')
data1 = grnxn['p1p2']
X1, Y1 = calc_curve(data1)
traces = []
traces.append({'x': X1, 'y': Y1, 'name': 'Thesis Variables'})
plot({'data': traces})
'temp-plot.html'
%matplotlib inline
import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
from scipy.stats._continuous_distns import _distn_names
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams['figure.figsize'] = (16.0, 12.0)
matplotlib.style.use('ggplot')
# Create models from data
def best_fit_distribution(data, bins=200, ax=None):
"""Model data by finding best fit distribution to data"""
# Get histogram of original data
y, x = np.histogram(data, bins=bins, density=True)
x = (x + np.roll(x, -1))[:-1] / 2.0
# Best holders
best_distributions = []
# Estimate distribution parameters from data
for ii, distribution in enumerate([d for d in _distn_names if not d in ['levy_stable', 'studentized_range']]):
print("{:>3} / {:<3}: {}".format( ii+1, len(_distn_names), distribution ))
distribution = getattr(st, distribution)
# Try to fit the distribution
try:
# Ignore warnings from data that can't be fit
with warnings.catch_warnings():
warnings.filterwarnings('ignore')
# fit dist to data
params = distribution.fit(data)
# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]
# Calculate fitted PDF and error with fit in distribution
pdf = distribution.pdf(x, loc=loc, scale=scale, *arg)
sse = np.sum(np.power(y - pdf, 2.0))
# if axis pass in add to plot
try:
if ax:
pd.Series(pdf, x).plot(ax=ax)
end
except Exception:
pass
# identify if this distribution is better
best_distributions.append((distribution, params, sse))
except Exception:
pass
return sorted(best_distributions, key=lambda x:x[2])
def make_pdf(dist, params, size=10000):
"""Generate distributions's Probability Distribution Function """
# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]
# Get sane start and end points of distribution
start = dist.ppf(0.01, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.01, loc=loc, scale=scale)
end = dist.ppf(0.99, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.99, loc=loc, scale=scale)
# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = dist.pdf(x, loc=loc, scale=scale, *arg)
pdf = pd.Series(y, x)
return pdf
# Load data from statsmodels datasets
data = grnxn['p1p2']
# Plot for comparison
plt.figure(figsize=(12,8))
ax = data.plot(kind='hist', bins=50, density=True, alpha=0.5, color=list(matplotlib.rcParams['axes.prop_cycle'])[1]['color'])
# Save plot limits
dataYLim = ax.get_ylim()
# Find best fit distribution
best_distibutions = best_fit_distribution(data, 200, ax)
best_dist = best_distibutions[0]
# Update plots
ax.set_ylim(dataYLim)
ax.set_title(u'Strength of connection.\n All Fitted Distributions')
ax.set_xlabel(u'Strength (P1*P2)')
ax.set_ylabel('Frequency')
# Make PDF with best params
pdf = make_pdf(best_dist[0], best_dist[1])
# Display
plt.figure(figsize=(12,8))
ax = pdf.plot(lw=2, label='PDF', legend=True)
data.plot(kind='hist', bins=50, density=True, alpha=0.5, label='Data', legend=True, ax=ax)
param_names = (best_dist[0].shapes + ', loc, scale').split(', ') if best_dist[0].shapes else ['loc', 'scale']
param_str = ', '.join(['{}={:0.2f}'.format(k,v) for k,v in zip(param_names, best_dist[1])])
dist_str = '{}({})'.format(best_dist[0].name, param_str)
ax.set_title(u'Strength with best fit distribution \n' + dist_str)
ax.set_xlabel(u'Strength (P1*P2')
ax.set_ylabel('Frequency')
1 / 104: ksone 2 / 104: kstwo 3 / 104: kstwobign 4 / 104: norm 5 / 104: alpha 6 / 104: anglit 7 / 104: arcsine 8 / 104: beta 9 / 104: betaprime 10 / 104: bradford 11 / 104: burr 12 / 104: burr12 13 / 104: fisk 14 / 104: cauchy 15 / 104: chi 16 / 104: chi2 17 / 104: cosine 18 / 104: dgamma 19 / 104: dweibull 20 / 104: expon 21 / 104: exponnorm 22 / 104: exponweib 23 / 104: exponpow 24 / 104: fatiguelife 25 / 104: foldcauchy 26 / 104: f 27 / 104: foldnorm 28 / 104: weibull_min 29 / 104: weibull_max 30 / 104: genlogistic 31 / 104: genpareto 32 / 104: genexpon 33 / 104: genextreme 34 / 104: gamma 35 / 104: erlang 36 / 104: gengamma 37 / 104: genhalflogistic 38 / 104: genhyperbolic 39 / 104: gompertz 40 / 104: gumbel_r 41 / 104: gumbel_l 42 / 104: halfcauchy 43 / 104: halflogistic 44 / 104: halfnorm 45 / 104: hypsecant 46 / 104: gausshyper 47 / 104: invgamma 48 / 104: invgauss 49 / 104: geninvgauss 50 / 104: norminvgauss 51 / 104: invweibull 52 / 104: johnsonsb 53 / 104: johnsonsu 54 / 104: laplace 55 / 104: laplace_asymmetric 56 / 104: levy 57 / 104: levy_l 58 / 104: logistic 59 / 104: loggamma 60 / 104: loglaplace 61 / 104: lognorm 62 / 104: gilbrat 63 / 104: maxwell 64 / 104: mielke 65 / 104: kappa4 66 / 104: kappa3 67 / 104: moyal 68 / 104: nakagami 69 / 104: ncx2 70 / 104: ncf 71 / 104: t 72 / 104: nct 73 / 104: pareto 74 / 104: lomax 75 / 104: pearson3 76 / 104: powerlaw 77 / 104: powerlognorm 78 / 104: powernorm 79 / 104: rdist 80 / 104: rayleigh 81 / 104: loguniform 82 / 104: reciprocal 83 / 104: rice 84 / 104: recipinvgauss 85 / 104: semicircular 86 / 104: skewcauchy 87 / 104: skewnorm 88 / 104: trapezoid 89 / 104: trapz 90 / 104: triang 91 / 104: truncexpon 92 / 104: truncnorm 93 / 104: tukeylambda 94 / 104: uniform 95 / 104: vonmises 96 / 104: vonmises_line 97 / 104: wald 98 / 104: wrapcauchy 99 / 104: gennorm 100 / 104: halfgennorm 101 / 104: crystalball 102 / 104: argus
Text(0, 0.5, 'Frequency')
from pyvis.network import Network
import pandas as pd
got_net = Network(height='1080px', width='100%', bgcolor='#ffffff', font_color='black', directed=False)
# set the physics layout of the network
# got_net.barnes_hut()
got_data = grnxn
got_data = got_data[got_data['p1p2'] >= 0.2]
sources = got_data[3]
targets = got_data[4]
weights_edges = got_data['p1p2'].round(3)
weights_n1 = got_data['p1'].round(3)
weights_n2 = got_data['p2'].round(3)
colours = got_data['colour']
edge_data = zip(sources, targets, weights_edges, weights_n1, weights_n2, colours)
for e in edge_data:
src = e[0]
dst = e[1]
we = e[2]
wn1 = e[3]
wn2 = e[4]
c = e[5]
got_net.add_node(src, src, title=src, value=wn1, color=c)
got_net.add_node(dst, dst, title=dst, value=wn2, color=c)
got_net.add_edge(src, dst, label=we, value=we, color=c)
neighbor_map = got_net.get_adj_list()
edges = got_net.get_edges()
nodes=got_net.get_nodes()
N_nodes=len(nodes)
N_edges=len(edges)
weights=[[] for i in range(N_nodes)]
#Associating weights to neighbors
for i in range(N_nodes): #Loop through nodes
for neighbor in neighbor_map[nodes[i]]: #and neighbors
for j in range(N_edges): #associate weights to the edge between node and neighbor
if (edges[j]['from']==nodes[i] and edges[j]['to']==neighbor) or \
(edges[j]['from']==neighbor and edges[j]['to']==nodes[i]):
weights[i].append(edges[j]['value'])
for node,i in zip(got_net.nodes,range(N_nodes)):
node['value']=len(neighbor_map[node['id']])
node['weight']=[str(weights[i][k]) for k in range(len(weights[i]))]
list_neighbor=list(neighbor_map[node['id']])
#Concatenating neighbors and weights
hover_str=[list_neighbor[k]+' '+ node['weight'][k] for k in range(node['value'])]
#Setting up node title for hovering
node['title']+=' Neighbors:<br>'+'<br>'.join(hover_str)
got_net.show_buttons(filter_=['physics'])
got_net.show('Thesis_variables_GR.html')